Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver

Baiq Nurul Azmi; Arief Hermawan; Donny Avianto

doi:10.35746/jtim.v4i4.298

Baiq Nurul Azmi Universitas Teknologi Yogyakarta https://orcid.org/0000-0002-2154-8812
Arief Hermawan Universitas Teknologi Yogyakarta http://orcid.org/0000-0002-7061-8274
Donny Avianto Universitas Teknologi Yogyakarta http://orcid.org/0000-0001-5499-5478

DOI: https://doi.org/10.35746/jtim.v4i4.298

Keywords: PCA, Decision Tree, Liver Disease, Training, Testing

Abstract

Liver disease is one of the diseases that is difficult to detect and becomes the largest contributor to deaths because it is considered a silent killer without symptoms. Liver disease can be detected based on abnormalities in the number of contents in the human body. The Indian Liver Patient Dataset (ILPD) dataset has many variables related to content in the body of liver patient data which are used as parameters in the classification of liver disease patients. Previous studies have shown that only two variables influence the ILPD dataset. The purpose of this study is to examine the use of the Principal Component Analysis (PCA) method to determine the optimal number of features in the context of classification of liver disease and examine the percentage distribution of data training and data testing which produces the best accuracy. The ILPD dataset was obtained from the UCI Machine Learning website with a total of 583 rows of data and 11 features. The percentage of training data and testing data used is 50%:50%, 60%:40%, 70%:30%, 73%:27%, 75%:25%, 80%:20%, 83%:17%, 85%:15% and 90%:10%. The use of different training and testing data percentages serves to find the best accuracy. The PCA result feature is used as input for the Decision Tree C4.5 classification algorithm. The experimental results show that using the training and testing data distribution percentage of 90%:10% and after the application of PCA produces the highest accuracy, namely 78.40% which is obtained for the number of PCA components n = 8.

Downloads

Download data is not yet available.

Author Biography

Baiq Nurul Azmi, Universitas Teknologi Yogyakarta

Program Studi Magister Teknologi Informasi, Akreditasi Baik Sekali, Universitas Teknologi Yogyakarta

References

C. Y. Gobel, “Sistem Pakar Penyakit Liver Menggunakan K- Nearest Neighbors Algoritm Berbasis Website,” Ilk. J. Ilm., vol. 10, no. 2, pp. 152–159, 2018, doi: 10.33096/ilkom.v10i2.296.152-159.

N. T. Rahman, “Analisa Algoritma Decision Tree Dan Naïve Bayes Pada Pasien Penyakit Liver,” J. Fasilkom, vol. 10, no. 2, pp. 144–151, 2020, doi: 10.37859/jf.v10i2.2087.

B. L. Trust, “Liver disease is now the biggest cause of death in those aged between 35-49 years old, new report reveals,” British Liver Trust, 2019. [Online]. Available: https://britishlivertrust.org.uk/liver-disease-is-now-the-biggest-cause-of-death-in-those-aged-between-35-49-years-old-new-report-reveals/. [Accessed: 11-Dec-2022].

I. Setiawati, A. P. Wibowo, and A. Hermawan, “Implementasi Decision Tree Untuk Mendiagnosis Penyakit Liver,” J. Inf. Syst. Manag., vol. 1, no. 1, pp. 13–17, 2019.

D.- Restiani, “Kombinasi Algoritma Cart Dan Ripper Untuk Mendiagnosis Penyakit Liver Berbasis Correlation Based Feature Selection,” J. Tek. Inform., vol. 11, no. 1, pp. 31–36, 2018, doi: 10.15408/jti.v11i1.6660.

E. Muningsih, “Kombinasi Metode K-Means Dan Decision Tree Dengan Perbandingan Kriteria Dan Split Data,” J. Teknoinfo, vol. 16, no. 1, p. 113, 2022, doi: 10.33365/jti.v16i1.1561.

M. Windarti, “Perbandingan Kinerja Algoritma Naïve Bayes Dan Bayesian Network Dalam Klasifikasi Masa Studi Mahasiswa,” Pros. Semin. Nas. Apl. Sains Teknol., no. September, pp. 249–261, 2018.

“Indian Liver Patient Dataset,” UCI Machine Learning Repository. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/ILPD+(Indian+Liver+Patient+Dataset).

N. P. A. Widiari, I. M. A. D. Suarjaya, and D. P. Githa, “Teknik Data Cleaning Menggunakan Snowflake untuk Studi Kasus Objek Pariwisata di Bali,” J. Ilm. Merpati (Menara Penelit. Akad. Teknol. Informasi), vol. 8, no. 2, p. 137, 2020, doi: 10.24843/jim.2020.v08.i02.p07.

S. Raysyah, V. Arinal, and D. I. Mulyana, “Klasifikasi Tingkat Kematangan Buah Kopi Berdasarkan Deteksi Warna Menggunakan Metode Knn Dan Pca,” JSiI (Jurnal Sist. Informasi), vol. 8, no. 2, pp. 88–95, 2021, doi: 10.30656/jsii.v8i2.3638.

A. Ilmaniati and B. E. Putro, “Analisis komponen utama faktor-faktor pendahulu (antecendents) berbagi pengetahuan pada usaha mikro, kecil, dan menengah (UMKM) di Indonesia,” J. Teknol., vol. 11, no. 1, pp. 67–78, 2019.

R. A. Anggraini, G. Widagdo, A. S. Budi, and M. Qomaruddin, “Penerapan Data Mining Classification untuk Data Blogger Menggunakan Metode Naïve Bayes,” J. Sist. dan Teknol. Inf., vol. 7, no. 1, p. 47, 2019, doi: 10.26418/justin.v7i1.30211.

W. Musu, A. Ibrahim, and Heriadi, “Pengaruh Komposisi Data Training dan Testing terhadap Akurasi Algoritma C4.5,” in Seminar Sistem Informasi dan Teknologi Informasi (SISITI), 2021, pp. 186–195.

Y. Irawan, “Penerapan Algoritma Decision Tree C4.5 Untuk Memprediksi Kelayakan Calon Pendonor Melakukan Donor Darah Dengan Klasifikasi Data Mining,” JTIM J. Teknol. Inf. dan Multimed., vol. 2, no. 4, pp. 181–189, 2021, doi: 10.35746/jtim.v2i4.75.

S. Bahri, A. Lubis, U. Pembangunan, and P. Budi, “Metode Klasifikasi Decision Tree Untuk Memprediksi Juara English Premier League,” Sintaksis, vol. 2, no. 04, pp. 63–70, 2020.

M. H. Dunham, Data Mining Introductory and Advanced Topics. New Jersey: Prentice Hall, 2003.

I. Sutoyo, “Implementasi Algoritma Decision Tree Untuk Klasifikasi Data Peserta Didik,” J. PILAR Nusa Mandiri, vol. 14, no. 2, pp. 217–224, 2018, doi: 10.35329/jiik.v7i2.203.

D. A. H. D. Larasati and T. Sutrisno, “Tourism Site Recommendation in Jakarta Using Decision Tree Method Based on Web Review,” SSRN Electron. J., 2018, doi: 10.2139/ssrn.3268964.

B. N. Azmi, A. Hermawan, and D. Avianto, “Analisis Pengaruh PCA Pada Klasifikasi Kualitas Air Menggunakan Algoritma K-Nearest Neighbor dan Logistic Regression,” JUSTINDO (Jurnal Sist. dan Teknol. Informasi), vol. 7, no. 2, pp. 94–103, 2022.

N. I. Fadilah, B. Rahayudi, and M. T. Furqon, “Implementasi Algoritme Support Vector Machine ( SVM ) Untuk Klasifikasi Penyakit Dengan Gejala Demam,” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 2, no. 11, pp. 5619–5625, 2018.

I. Alfarobi, T. A. Tutupoly, and A. Suryanto, “Komparasi Algoritma C4.5, Naive Bayes, Dan Random Forest Untuk Klasifikasi Data Kelulusan Mahasiswa Jakarta,” BSI Repos., 2017.