Machine Learning Algorithm Comparison using Sampling Techniques for Car Insurance Claim Classification
Keywords:
Car Insurance Claim, Classification, Decision Tree, K-Nearest Neighbors, Logistic Regression, Naïve Bayes, Random Forest, Sampling TechniquesAbstract
This research aims to compare the performance of machine learning algorithms in the classification of car insurance claims using sampling techniques. The study focuses on addressing the issue of class imbalance in insurance claim data, which can result in errors when detecting fraudulent claims. Oversampling and Undersampling techniques are applied to Decision Tree, Random Forest, Naïve Bayes, K-Nearest Neighbors, and Logistic Regression algorithms. The research methodology follows Knowledge Discovery in Database principles and utilizes RapidMiner version 10.1 as the tool for constructing classification models using the insurance claim data. The evaluation results reveal that the K-Nearest Neighbor (K-NN) algorithm with Oversampling technique achieves the highest performance in predicting insurance claims, with an accuracy of 90.46%, recall of Yes class at 99.03%, recall of No class at 81.88%, precision of Yes class at 84.53%, precision of No class at 98.83%, and an AUC of 0.984. Furthermore, the evaluation and visualization of performance comparisons indicate that the Random Forest (RF) algorithm with Oversampling technique and the K-Nearest Neighbors (K-NN) algorithm with Oversampling technique exhibit the most promising results in predicting car insurance claims.
Downloads
References
A. A. Firdaus and A. Komarudin, “Klasifikasi Pemegang Polis Menggunakan Metode XGBoost,” Pros. Stat., vol. 7, no. 2, pp. 704–710, 2021, [Online]. Available: http://dx.doi.org/10.29313/.v0i0.30320
R. Ramlah, “Penerapan Ganti Rugi Asuransi Mobil Pada Kasus Kecelakaan Dan Pencurian PT. Asuransi Tri Pakarta,” Optim. J. Ekon. dan Manaj., vol. 2, no. 2, pp. 223–232, 2022, doi: 10.55606/optimal.v2i2.643.
I. I. Information, “Facts + Statistics: Auto insurance.” [Online]. Available: https://www.iii.org/fact-statistic/facts-statistics-auto-insurance
F. D. Astuti and F. N. Lenti, “Implementasi SMOTE untuk mengatasi Imbalance Class pada Klasifikasi Car Evolution menggunakan K-NN,” JUPITER (Jurnal Penelit. Ilmu dan Teknol. Komputer), vol. 13, no. 1, pp. 89–98, 2021.
S. Makki, Z. Assaghir, Y. Taher, R. Haque, M. S. Hacid, and H. Zeineddine, “An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection,” IEEE Access, vol. 7, pp. 93010–93022, 2019, doi: 10.1109/ACCESS.2019.2927266.
A. P. Ayudhitama and U. Pujianto, “JIP (Jurnal Informatika Polinema) ANALISA 4 ALGORITMA DALAM KLASIFIKASI PENYAKIT LIVER MENGGUNAKAN RAPIDMINER”.
A. I. Alrais, “Fraudulent Insurance Claims Detection Using Machine Learning by A Capstone Submitted in Partial Fulfilment of the Requirements for the,” 2022.
I. Kurniawan, D. C. P. Buani, A. Abdussomad, W. Apriliah, and E. Fitriani, “Penerapan Teknik Random Undersampling untuk Mengatasi Imbalance Class dalam Prediksi Kebakaran Hutan Menggunakan Algoritma Decision Tree,” Acad. J. Comput. Sci. Res., vol. 5, no. 1, p. 1, 2023, doi: 10.38101/ajcsr.v5i1.617.
A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit,” J. Inform., vol. 5, no. 2, pp. 175–185, 2018, doi: 10.31311/ji.v5i2.4158.
L. Fadilah, Klasifikasi Random Forest pada Data Imbalanced Program Studi Matematika Universitas Islam Negeri Syarif Hidayatullah 2018 / 1439 H Klasifikasi Random Forest. 2018.
D. P. Utomo and M. Mesran, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,” J. MEDIA Inform. BUDIDARMA, vol. 4, no. 2, p. 437, Apr. 2020, doi: 10.30865/mib.v4i2.2080.
F. Gorunescu, Data mining: concepts, models and techniques, vol. 21, no. 1. 2011. [Online]. Available: http://journal.um-surabaya.ac.id/index.php/JKM/article/view/2203
A. Saputra and Suharjito, “Fraud detection using machine learning in e-commerce,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 9, pp. 332–339, 2019, doi: 10.14569/ijacsa.2019.0100943.
P. Agustia Rahayuningsih, R. Maulana, P. Studi Komputerisasi Akuntansi, A. BSI Pontianak, and J. Abdurahman Saleh, “Analisis Perbandingan Algoritma Klasifikasi Data Mining Untuk Dataset Blogger Dengan Rapid Miner,” vol. VI, no. 1, 2018.
D. Y. Mohammed, “Detection of Vehicle Insurance Claim Fraud: A Fraud Detection Use-Case for the Vehicle Insurance Industry,” Int. J. Progress. Sci. …, vol. 30, no. March, pp. 504–507, 2021, [Online]. Available: http://ijpsat.es/index.php/ijpsat/article/view/3919%0Ahttp://ijpsat.es/index.php/ijpsat/article/download/3919/2405
F. D. Pramakrisna, F. D. Adhinata, and N. A. F. Tanjung, “Aplikasi Klasifikasi SMS Berbasis Web Menggunakan Algoritma Logistic Regression,” Teknika, vol. 11, no. 2, pp. 90–97, 2022, doi: 10.34148/teknika.v11i2.466.
S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, Dec. 2019, doi: 10.1186/s12911-019-1004-8.
J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” IEEE Access, vol. 8, no. Ml, pp. 107562–107582, 2020, doi: 10.1109/ACCESS.2020.3001149.
A. Maghfiroh, Y. Findawati, and U. Indahyanti, “Klasifikasi Penipuan pada Rekening Bank menggunakan Pendekatan Ensemble Learning,” vol. 4, no. 4, pp. 1883–1891, 2023, doi: 10.47065/bits.v4i4.3212.
D. A. Kristiyanti, “Analisis sentimen review produk kosmetik melalui komparasi feature selection,” Konf. Nas. ilmu Pengetah. dan Teknol., vol. 2, no. 2, pp. 74–81, 2015.
R. Indrayani, “ANALISA PERBANDINGAN ALGORITME NAÏVE BAYES DAN DECISION TREE PADA KLASIFIKASI DATA TRANSFUSI DARAH,” 2018.
D. Retno Utari and A. Wibowo, “Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining,” Pros. Semin. Nas. Teknoka, vol. 5, no. 2502, pp. 19–24, 2020, doi: 10.22236/teknoka.v5i.391.
R. N. Devita, H. W. Herwanto, and A. P. Wibawa, “Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa indonesia,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 4, p. 427, Oct. 2018, doi: 10.25126/jtiik.201854773.
S. Panigrahi and B. Palkar, “Comparative Analysis on Classification Algorithms of Auto-Insurance Fraud Detection based on Feature Selection Algorithms,” Int. J. Comput. Sci. Eng., vol. 6, no. 9, pp. 72–77, 2018, doi: 10.26438/ijcse/v6i9.7277.
A. Rohman and A. Rufiyanto, “Implementasi Data Mining Dengan Algoritma Decision Tree C4 . 5 Untuk Prediksi Kelulusan Mahasiswa Di Universitas Pandaran,” Proceeding SINTAK 2019, pp. 134–139, 2019.
A. I. Lubis, U. Erdiansyah, and R. Siregar, “Komparasi Akurasi pada Naive Bayes dan Random Forest dalam Klasifikasi Penyakit Liver,” J. Comput. Eng. Syst. Sci., vol. 7, no. 1, pp. 81–89, 2022.
J. Alzubi, A. Nayyar, and A. Kumar, “Machine Learning from Theory to Algorithms: An Overview,” J. Phys. Conf. Ser., vol. 1142, no. 1, 2018, doi: 10.1088/1742-6596/1142/1/012012.
F. Handayani et al., “JEPIN (Jurnal Edukasi dan Penelitian Informatika) Komparasi Support Vector Machine, Logistic Regression Dan Artificial Neural Network dalam Prediksi Penyakit Jantung,” vol. 7, no. 3, pp. 329–334, 2021.
Y. A. Suwitono and F. J. Kaunang, “Implementasi Algoritma Convolutional Neural Network (CNN) Untuk Klasifikasi Daun Dengan Metode Data Mining SEMMA Menggunakan Keras,” J. Komtika (Komputasi dan Inform., vol. 6, no. 2, pp. 109–121, 2022, doi: 10.31603/komtika.v6i2.8054.
Y. I. Kurniawan, “Perbandingan Algoritma Naive Bayes dan C.45 dalam Klasifikasi Data Mining,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 4, p. 455, Oct. 2018, doi: 10.25126/jtiik.201854803.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.