Prediksi Gender Berdasarkan Nama Menggunakan Kombinasi Model IndoBERT, Convolutional Neural Network (CNN) dan Bidirectional Long Short-Term Memory (BiLSTM)

Authors

  • Abi Mas'ud Magister Ilmu Komputer, Universitas Bumigora
  • Bambang Krismono Triwijoyo Magister Ilmu Komputer, Universitas Bumigora
  • Dadang Priyanto Magister Ilmu Komputer, Universitas Bumigora

DOI:

https://doi.org/10.35746/jtim.v7i3.736

Keywords:

Deep Learning, CNN, IndoBERT, BiLSTM, Gender prediction, Text Classification

Abstract

This study proposes a name-based gender prediction model in the Indonesian language by combining the architectures of Indonesian Bidirectional Encoder Representations from Transformers (IndoBERT), Convolutional Neural Network (CNN), and Bidirectional Long Short-Term Memory (BiLSTM). The non-standardized and diverse structure of Indonesian names presents a significant challenge for text-based gender classification tasks. To address this, a hybrid approach was developed to leverage the contextual representation power of IndoBERT, the local pattern extraction capability of CNN, and the sequential dependency modeling strength of BiLSTM. The dataset consists of 4,796 student names from Universitas Bumigora, collected between 2018 and 2023. The preprocessing steps include lowercasing, punctuation removal, label encoding, and train-test splitting. Evaluation results based on accuracy, precision, recall, and F1-score indicate that the IndoBERT-CNN-BiLSTM model achieved the best performance, with an accuracy of 90.94%, F1-score of 91.03%, and training stability without signs of overfitting. This model demonstrates high effectiveness in name-based gender classification and holds strong potential for applications such as population information systems, service personalization, and name-based demographic analysis.

Downloads

Download data is not yet available.

References

H. N. Trung, V. T. Hoang, and T. H. Huong, “Transformer-based Approach for Gender Prediction using Vietnamese Names,” Procedia Comput. Sci., vol. 235, pp. 2362–2369, 2024, https://doi.org/10.1016/j.procs.2024.04.224.

Z. You et al., “Gender Prediction Based on Vietnamese Names with Machine Learning Techniques”.

R. C. B. Rego, V. M. L. Silva, and V. M. Fernandes, “Predicting Gender by First Name Using Character-level Machine Learning,” 2021, http://arxiv.org/abs/2106.10156

R. Artikel, A. M. Kusuma, H. Aulia, M. A. Oktavian, M. R. Akbar, and A. Abdiansah, “Prediksi Gender Berdasarkan Nama Bahasa Indonesia Menggunakan Long Short Term Memory Gender Prediction Based on Indonesian Names Using Long Short Term Memory,” vol. 9, pp. 265–271, 2023, https://doi.org/10.28932/jutisi.v9i2.6404.

L. Rhue, S. Goethals, and A. Sundararajan, “Evaluating LLMs for Gender Disparities in Notable Persons,” 2024, http://arxiv.org/abs/2403.09148

D. Y. Yefferson, V. Lawijaya, and A. S. Girsang, “Hybrid model: IndoBERT and long short-term memory for detecting Indonesian hoax news,” IAES Int. J. Artif. Intell., vol. 13, no. 2, pp. 1911–1922, 2024, https://doi.org/10.11591/ijai.v13.i2.pp1913-1924.

W. Zichang, L. Xiaoping, D. Science, and C. Science, “Gender prediction model based on CNN-BiLSTM-attention hybrid,” vol. 33, no. 4, pp. 2366–2390, https://www.aimspress.com/aimspress-data/era/2025/4/PDF/era-33-04-105.pdf.

E. E. Abdallah, J. R. Alzghoul, and M. Alzghool, “Gender Prediction Based on various Nationality Names using Deep Learning techniques,” Procedia Comput. Sci., vol. 170, pp. 563–570, 2020, https://doi.org/10.1016/j.procs.2020.03.126.

R. C. B. Rego, V. M. L. Silva, and V. M. Fernandes, “Predicting Gender by First Name Using Character-level Machine Learning,” pp. 1–8, 2021, http://arxiv.org/abs/2106.10156

J. F. Ani, M. Islam, N. J. Ria, S. Akter, and A. K. M. Masum, “Estimating Gender Based On Bengali Conventional Full Name With Various Machine Learning Techniques,” 2021 12th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2021, 2021, https://doi.org/10.1109/ICCCNT51525.2021.9579927.

R. Ghosh, “Name based Gender Identification using Machine Learning and Deep Learning Models,” pp. 1–6, 2023, https://doi.org/10.36227/techrxiv.21388140.v1.

A. Zein, “Memprediksi Usia Dan Jenis Kelamin Menggunakan Convolutional Neural Networks,” Sainstech J. Penelit. dan Pengkaj. Sains dan Teknol., vol. 30, no. 1, pp. 1–7, 2020, https://doi.org/10.37277/stch.v30i1.727.

A. Saoud, M. Alomeyr, M. F. Amasyali, and H. T. Kesgin, “Inferring gender from first names: Comparing the accuracy of Genderize, Gender API, and the gender R package on authors of diverse nationality,” 2024 Innov. Intell. Syst. Appl. Conf. ASYU 2024, 2024, https://doi.org/10.1109/ASYU62119.2024.10757039.

R. Ghosh, “Unveiling Gender from Indonesian Names Using Random Forest and Logistic Regression Algorithms,” TechRxiv, pp. 1–6, 2022.

W. Anggraeni, M. F. A. Kusuma, E. Riksakomara, R. P. Wibowo, Pujiadi, and S. Sumpeno, “Combination of BERT and Hybrid CNN-LSTM Models for Indonesia Dengue Tweets Classification,” Int. J. Intell. Eng. Syst., vol. 17, no. 1, pp. 813–826, 2024, https://doi.org/10.22266/ijies2024.0229.68.

H. D. Arpita, A. Al Ryan, M. F. Hossain, M. S. Rahman, M. Sajjad, and N. N. I. Prova, “Exploring Bengali speech for gender classification: machine learning and deep learning approaches,” Bull. Electr. Eng. Informatics, vol. 14, no. 1, pp. 328–337, 2025, https://doi.org/10.11591/eei.v14i1.8146.

Downloads

Published

2025-06-18

Issue

Section

Articles

How to Cite

[1]
A. Mas'ud, B. K. Triwijoyo, and D. Priyanto, “Prediksi Gender Berdasarkan Nama Menggunakan Kombinasi Model IndoBERT, Convolutional Neural Network (CNN) dan Bidirectional Long Short-Term Memory (BiLSTM)”, jtim, vol. 7, no. 3, pp. 448–460, Jun. 2025, doi: 10.35746/jtim.v7i3.736.

Most read articles by the same author(s)