Pengaruh Teknik Representasi Teks Bag-of-Words dan TF-IDF terhadap Akurasi Klasifikasi Sentimen Teks Multi-Domain

Penulis

  • Angelica Davina Meisya Putri Program Studi Ilmu Komputer, Fakultas Teknik, Universitas Bumigora Mataram
  • Neny Sulistianingsih Program Studi Magister Ilmu Komputer, Program Pascasarjana, Universitas Bumigora Mataram
  • Ria Rismayati Program Studi Ilmu Komputer, Fakultas Teknik, Universitas Bumigora Mataram

DOI:

https://doi.org/10.35746/jtim.v7i4.756

Kata Kunci:

Sentiment Analysis, Bag-of-Words, TF-IDF, Text Classification

Abstrak

Representasi teks merupakan komponen esensial dalam sistem analisis sentimen, karena menentukan bagaimana data teks diubah menjadi fitur numerik yang dapat dimanfaatkan oleh algoritma klasifikasi. Penelitian ini bertujuan untuk menganalisis pengaruh dua teknik representasi teks populer, yaitu Bag-of-Words (BoW) dan Term Frequency–Inverse Document Frequency (TF-IDF), terhadap performa klasifikasi sentimen teks pendek dalam konteks multi-domain. Dataset yang digunakan merupakan hasil kombinasi antara data asli dan data augmentasi berbasis sinonim, dengan total 418 entri teks. Dua algoritma pembelajaran mesin yang digunakan dalam evaluasi adalah Ridge Classifier dan Complement Naïve Bayes. Penilaian dilakukan menggunakan teknik validasi silang Stratified K-Fold serta empat metrik evaluasi utama: akurasi, presisi, recall, dan F1-score. Hasil eksperimen menunjukkan bahwa representasi TF-IDF secara konsisten memberikan performa lebih baik dibandingkan BoW pada kedua model. Konfigurasi terbaik dicapai oleh Ridge Classifier dengan TF-IDF, yang memperoleh akurasi sebesar 0,911 dan F1-score sebesar 0,908. Temuan ini menggarisbawahi pentingnya pemilihan teknik representasi fitur yang tepat dalam meningkatkan efektivitas sistem klasifikasi sentimen berbasis teks.

Unduhan

Data unduhan tidak tersedia.

Referensi

. Nurhaliza Agustina, R. Novita, Mustakim, and N. E. Rozanda, “The Implementation of TF-IDF and Word2Vec on Booster Vaccine Sentiment Analysis Using Support Vector Machine Algorithm,” Procedia Comput. Sci., vol. 234, pp. 156–163, 2024, https://doi.org/10.1016/j.procs.2024.02.162.

A. Y. Sain, S. A. S. Mola, A. Y. Huan, and I. R. Nomleni, “Analisis Sentimen Sunscreen Azarine dengan Naïve Bayes di Toko Aneka Kosmetik Kupang pada Marketplace Shopee,” J. SAINTEKOM, vol. 15, no. 1, pp. 94–105, Mar. 2025, https://doi.org/10.33020/saintekom.v15i1.783.

F. Greco, “Sentiment analysis and opinion mining,” in Elgar Encyclopedia of Technology and Politics, vol. 4, no. 27, Edward Elgar Publishing, 2022, pp. 105–108. https://hdl.handle.net/11573/1616253

A. Iyengar, S. Divyashree, N. Huda, S. Katira, and K. Jitendra Vasudevshahapur, “Insight into Sentimental Analysis,” JETIR2006559 J. Emerg. Technol. Innov. Res., vol. 7, no. 6, pp. 1561–1566, 2020, https://www.jetir.org/papers/JETIR2006559.pdf.

A. Ananta Firdaus, A. Id Hadiana, and A. Kania Ningsih, “Klasifikasi Sentimen pada Aplikasi Shopee Menggunakan Fitur Bag of Word dan Algoritma Random Forest,” Ranah Res. J. Multidiscip. Res. Dev., vol. 6, no. 5, pp. 1678–1683, Jul. 2024, https://doi.org/10.38035/rrj.v6i5.994.

J. Yuan, Y. Zhao, and B. Qin, “Learning to share by masking the non-shared for multi-domain sentiment classification,” Int. J. Mach. Learn. Cybern., vol. 13, no. 9, pp. 2711–2724, 2022, https://doi.org/10.1007/s13042-022-01556-0.

J. Lu, “Text vectorization in sentiment analysis: A comparative study of TF-IDF and Word2Vec from Amazon Fine Food Reviews,” ITM Web Conf., vol. 70, no. 03001, p. 03001, Jan. 2025, https://doi.org/10.1051/itmconf/20257003001.

Z. Zhan, “Comparative Analysis of TF-IDF and Word2Vec in Sentiment Analysis?: A Case of Food Reviews,” ITM Web Conf., vol. 70, no. 02013, pp. 1–7, 2025, https://doi.org/10.1051/itmconf/20257002013.

M. S. Kobari, N. Karimi, B. Pourhosseini, and R. Mousa, “weighted CapsuleNet networks for Persian multi-domain sentiment analysis,” arXiv.org, vol. 2306, no. 17068, pp. 1–20, Jul. 2023, https://doi.org/10.48550/arXiv.2306.17068.

Allanatrix, “How to Use Sentiment Analysis Evaluation Dataset,” Kaggle, 2025. https://www.kaggle.com/code/allanwandia/how-to-use-sentiment-analysis-evaluation-dataset/.

Prishasawhney, “sentiment-analysis-evaluation-dataset,” Kaggle, 2025. https://www.kaggle.com/datasets/prishasawhney/sentiment-analysis-evaluation-dataset.

A. Muzaki and S. Agustin, “Sistem Informasi Agenda Rapat Berbasis Web untuk Optimalisasi Kinerja Dinas Kominfo Lamongan,” remik, vol. 9, no. 1, pp. 161–174, Jan. 2025, https://doi.org/10.33395/remik.v9i1.14366.

V. Meida Hersianty, E. Larasati Amalia, D. Puspitasari, and D. Wahyu Wibowo, “Penerapan Algoritma Tf-Idf Dan Cosine Similarity Dalam Sistem Rekomendasi Lowongan Pekerjaan,” JATI (Jurnal Mhs. Tek. Inform., vol. 9, no. 1, pp. 1619–1625, Jan. 2025, https://doi.org/10.36040/jati.v9i1.12406.

T. I. Simanjuntak, M. Muhathir, F. Fadlisyah, and I. Safira, “Performance Analysis of Naive Bayes Variation Method in Spice Image Classification Using Histogram of Gradient Oriented (HOG) Feature Extraction,” J. Informatics Telecommun. Eng., vol. 7, no. 1, pp. 282–291, 2023, https://doi.org/10.31289/jite.v7i1.7957.

H. D. Cahyono, A. Mahadewa, A. Wijayanto, D. W. Wardani, and H. Setiadi, “Fast Naïve Bayes classifiers for COVID-19 news in social networks,” Indones. J. Electr. Eng. Comput. Sci., vol. 34, no. 2, pp. 1033–1041, 2024, https://doi.org/10.11591/ijeecs.v34.i2.pp1033-1041.

A. Budiman, J. C. Young, and A. Suryadibrata, “Implementasi Algoritma Naïve Bayes untuk Klasifikasi Konten Twitter dengan Indikasi Depresi,” J. Inform. J. Pengemb. IT, vol. 6, no. 2, pp. 133–138, 2021, https://doi.org/10.30591/jpit.v6i2.2419.

R. R. Sani, Y. A. Pratiwi, S. Winarno, E. D. Udayanti, and F. Alzami, “Analisis Perbandingan Algoritma Naive Bayes Classifier dan Support Vector Machine untuk Klasifikasi Berita Hoax pada Berita Online Indonesia,” J. Masy. Inform., vol. 13, no. 2, pp. 85–98, 2022, https://doi.org/10.14710/jmasif.13.2.47983.

B. Wicaksono and N. Cahyono, “Analisis Sentimen Komentar Instagram Pada Program Kampus Merdeka Dengan Algoritma Naive Bayes Dan Decision Tree,” JATI (Jurnal Mhs. Tek. Inform., vol. 8, no. 2, pp. 2372–2381, 2024, https://doi.org/10.36040/jati.v8i2.9473.

S. Sasidharan Nair and M. Subaji, “Automated Identification of Breast Cancer Type Using Novel Multipath Transfer Learning and Ensemble of Classifier,” IEEE Access, vol. 12, pp. 87560–87578, 2024, https://doi.org/10.1109/ACCESS.2024.3415482.

T. M. Umar, “Klasifikasi teks multilabel mengenai bencana alam pada media sosial twitter menggunakan ridge classifier,” Universitas Islam Negeri Syarif Hidayatullah Jakarta, 2023. https://repository.uinjkt.ac.id/dspace/handle/123456789/66805

S. Raschka, “Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning,” Nov. 2020, http://arxiv.org/abs/1811.12808.

M. B. - and D. B. B. -, “A Comprehensive Review of Cross-Validation Techniques in Machine Learning,” Int. J. Sci. Technol., vol. 16, no. 1, pp. 1–4, Jan. 2025, https://doi.org/10.71097/IJSAT.v16.i1.1305.

W. Wijiyanto, A. I. Pradana, S. Sopingi, and V. Atina, “Teknik K-Fold Cross Validation untuk Mengevaluasi Kinerja Mahasiswa,” J. Algoritm., vol. 21, no. 1, pp. 239–248, 2024, https://doi.org/10.33364/algoritma/v.21-1.1618.

S. Prusty, S. Patnaik, and S. K. Dash, “SKCV: Stratified K-fold cross-validation on ML classifiers for predicting cervical cancer,” Front. Nanotechnol., vol. 4, no. August, pp. 1–12, 2022, https://doi.org/10.3389/fnano.2022.972421.

M. K. Mayangsari, I. Syarif, and A. Barakbah, “Evaluation of Stratified K-Fold Cross Validation for Predicting Bug Severity in Game Review Classification,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 3, pp. 277–288, Jul. 2023, https://doi.org/10.22219/kinetik.v8i3.1740.

G. H. Martono and N. Sulistianingsih, “Perbandingan Matriks Jarak pada Algoritma K-NN untuk Prediksi Penyakit Diabetes,” JoMI J. Millenn. Informatics, vol. 2, no. 1, pp. 1–6, 2024, https://journal.mudaberkarya.id/index.php/JoMI/article/view/110.

Syahril Dwi Prasetyo, Shofa Shofiah Hilabi, and Fitri Nurapriani, “Analisis Sentimen Relokasi Ibukota Nusantara Menggunakan Algoritma Naïve Bayes dan KNN,” J. KomtekInfo, vol. 10, pp. 1–7, 2023, https://doi.org/10.35134/komtekinfo.v10i1.330.

F. A. Larasati, D. E. Ratnawati, and B. T. Hanggara, “Analisis Sentimen Ulasan Aplikasi Dana dengan Metode Random Forest,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 6, no. 9, pp. 4305–4313, 2022, https://j-ptiik.ub.ac.id/index.php/j-ptiik/article/view/11562.

N. N. Hidayati, “Improving Aspect-Based Sentiment Analysis for Hotel Reviews with Latent Dirichlet Allocation and Machine Learning Algorithms,” Regist. J. Ilm. Teknol. Sist. Inf., vol. 9, no. 2, pp. 144–157, 2023, https://doi.org/10.26594/register.v9i2.3441.

A. Pratama, R. I. Alhaqq, and Y. Ruldeviyani, “Sentiment Analysis of the Covid-19 Booster Vaccination Program As a Requirement for Homecoming During Eid Fitr in Indonesia,” J. Theor. Appl. Inf. Technol., vol. 101, no. 1, pp. 248–261, 2023, https://api.semanticscholar.org/CorpusID:264346213.

S. Akhtar, “Machine Learning in Business Analytics: Advancing Statistical Methods for Data- Driven Innovation,” J. Comput. Sci. Technol. Stud., pp. 104–111, 2025, https://doi.org/10.32996/jcsts.2023.5.3.8.

A. Zamsuri, S. Defit, and G. W. Nurcahyo, “Classification Of Multiple Emotions In Indonesian Text Using The K-Nearest Neighbor Method,” J. Appl. Eng. Technol. Sci., vol. 4, no. 2, pp. 1012–1021, 2023, https://doi.org/10.37385/jaets.v4i2.1964.

K. Shanmugavadivel, M. Subramanian, S. Shahidkhan, S. S. S, and S. Yashica, “KEC _ AI _ GRYFFINDOR @ DravidianLangTech 2025?: Multimodal Hate Speech Detection in Dravidian languages,” Proc. ofthe Fifth Work. Speech, Vision, Lang. Technol. Dravidian Lang., pp. 269–273, 2025, https://aclanthology.org/2025.dravidianlangtech-1.31/.

A. Zamsuri, S. D, and E. Asril, “Deteksi Hate Speech Pada Pemilu 2024 Menggunakan Algoritma Machine Learning,” Zo. J. Sist. Inf., vol. 7, no. 1, pp. 228–241, 2024, https://journal.unilak.ac.id/index.php/zn/article/view/22049.

Diterbitkan

2025-10-01

Terbitan

Bagian

Articles

Cara Mengutip

[1]
A. D. M. Putri, N. Sulistianingsih, dan R. Rismayati, “Pengaruh Teknik Representasi Teks Bag-of-Words dan TF-IDF terhadap Akurasi Klasifikasi Sentimen Teks Multi-Domain”, jtim, vol. 7, no. 4, hlm. 675–688, Okt 2025, doi: 10.35746/jtim.v7i4.756.

Artikel paling banyak dibaca berdasarkan penulis yang sama