Implementation of Split Sampling for Decision Tree and K-Nearest Neighbor Algorithms in the DKI Jakarta Legislative Election

Penulis

  • Bambang Wisnu Widagdo Universitas Pamulang, Indonesia
  • Muhammad Rizky Fadillah Universitas Pamulang, Indonesia
  • Mochamad Adhari Adiguna Universitas Pamulang, Indonesia
  • Sudarno Wiharjo Universitas Pamulang, Indonesia
  • Murni Handayani Universitas Pamulang, Indonesia

DOI:

https://doi.org/10.54082/jupin.1704

Kata Kunci:

Data Mining, Decision Tree, K-Nearest Neighbor, Linear Sampling, Shuffled Sampling, Stratified Sampling

Abstrak

The 2009 legislative election was contested by 44 political parties, consisting of national and local parties. In the 2009 Legislative Election for DKI Jakarta, there were 2,268 candidates for the Regional House of Representatives (DPRD) from 44 parties competing for 94 seats in the DKI Jakarta Regional People's Representative Council. Data mining is a series of processes aimed at discovering added value in the form of information that has not been previously known manually from a database. The classification method can be used to predict the results of legislative elections. In this study, the author employs the Decision Tree and K-Nearest Neighbor classification algorithms. This research utilizes several data sampling techniques, namely Linear Sampling, Shuffled Sampling, and Stratified Sampling. The data split partition used in this study was 80% for training and 20% for testing. The software tool utilized was RapidMiner. The performance variables measured include Recall, Precision, and Accuracy. The results of this study indicate that, overall, Linear Split Sampling outperforms Shuffled Split Sampling and Stratified Split Sampling. For the Decision Tree algorithm, Linear Split Sampling achieved a Recall of 100%, Precision of 82.05%, and Accuracy of 98.46%. Shuffled Split Sampling recorded a Recall of 81.82%, Precision of 85.71%, and Accuracy of 98.46%. Meanwhile, Stratified Split Sampling obtained a Recall of 100%, Precision of 82.05%, and Accuracy of 97.80%. Meanwhile, for the K-Nearest Neighbor (KNN) algorithm, Linear Split Sampling achieved a Recall of 93.75%, Precision of 75%, and Accuracy of 97.36%. Shuffled Split Sampling recorded a Recall of 59.09%, Precision of 81.25%, and Accuracy of 97.36%. Stratified Split Sampling obtained a Recall of 31.58%, Precision of 85.71%, and Accuracy of 96.92%.

Referensi

Han, J., & Kamber, M. (2006). Data mining: Concepts and techniques (2nd ed.). Morgan Kaufmann.

Han, J., Kamber, M., & Pei, J. (2011). Data mining: Concepts and techniques (3rd ed.). Elsevier.

Hermawati, F. A. (2013). Data mining (Edisi 1). Andi Offset.

Kusrini, & Luthfi, E. T. (2009). Algoritma data mining (Edisi 1). Andi Offset.

Kusnawi. (2007, November 24). Pengantar solusi data mining. Paper presented at Seminar Nasional Teknologi STMIK AMIKOM Yogyakarta, Yogyakarta, Indonesia.

Larose, D. T. (2005). Discovering knowledge in data: An introduction to data mining (2nd ed.). Wiley Interscience.

Republik Indonesia. (2008). Undang-undang No. 10 Tahun 2008 tentang Pemilihan Umum Anggota DPR, DPD, dan DPRD Pasal 5 Ayat 2. Lembaran Negara RI Tahun 2008. Sekretariat Negara.

Republik Indonesia. (2008). Undang-undang No. 10 Tahun 2008 tentang Pemilihan Umum Anggota DPR, DPD, dan DPRD Pasal 12. Lembaran Negara RI Tahun 2008. Sekretariat Negara.

RapidMiner Documentation. (n.d.). Split Data. Retrieved from https://docs.rapidminer.com/latest/studio/operators/blending/examples/sampling/split_data.html

Habibi, M. (2016). Prediksi hasil pemilihan umum legislatif DPRD Provinsi Jawa Tengah menggunakan metode decision tree dan algoritma C4.5 [Undergraduate thesis, Universitas Dian Nuswantoro Semarang].

Faid, M., Syahputra, M., & Wahyudi, H. (2019). Perbandingan kinerja tool data mining Weka dan RapidMiner dalam algoritma klasifikasi. Universitas Nurul Jadid.

Franseda, A., Subiyanto, & Sari, R. F. (2020). Integrasi metode Decision Tree dan SMOTE untuk klasifikasi data kecelakaan lalu lintas. Jurnal Sistem dan Teknologi Informasi, 8.

Listyaningrum, S. (2021). Penerapan data mining untuk analisis karakteristik DPT non-participate sebagai prediksi partisipan pemilu dengan menggunakan metode Naive Bayes Classifier [Undergraduate thesis, Universitas Dian Nuswantoro].

Nanfack, W., Gaur, M., Almeida, F., Namoun, A., & Purohit, H. (2023). Decision trees: From efficient prediction to responsible AI. Patterns, 4(5), 100740. https://doi.org/10.1016/j.patter.2023.100740

Vitalaya, N. A. R. (2024). Perbandingan tipe sampling pada klasifikasi minat TIK [Master’s thesis, UIN Syarif Hidayatullah Jakarta].

Nanfack, O., Gaur, M., Almeida, F., Namoun, A., & Purohit, H. (2023). Decision trees: From efficient prediction to responsible AI. Patterns, 4(5), 100740. https://doi.org/10.1016/j.patter.2023.100740

Diterbitkan

11-11-2025

Cara Mengutip

Widagdo, B. W., Fadillah, M. R., Adiguna, M. A., Wiharjo, S., & Handayani, M. (2025). Implementation of Split Sampling for Decision Tree and K-Nearest Neighbor Algorithms in the DKI Jakarta Legislative Election. Jurnal Penelitian Inovatif, 5(4), 2979–2986. https://doi.org/10.54082/jupin.1704