Machine Learning Algorithm Comparison using Sampling Techniques for Car Insurance Claim Classification

Authors

  • Gerry Geraldo German, Dinar Ajeng Kristiyanti

Keywords:

Car Insurance Claim, Classification, Decision Tree, K-Nearest Neighbors, Logistic Regression, Naïve Bayes, Random Forest, Sampling Techniques

Abstract

This research aims to compare the performance of machine learning algorithms in the classification of car insurance claims using sampling techniques. The study focuses on addressing the issue of class imbalance in insurance claim data, which can result in errors when detecting fraudulent claims. Oversampling and Undersampling techniques are applied to Decision Tree, Random Forest, Naïve Bayes, K-Nearest Neighbors, and Logistic Regression algorithms. The research methodology follows Knowledge Discovery in Database principles and utilizes RapidMiner version 10.1 as the tool for constructing classification models using the insurance claim data. The evaluation results reveal that the K-Nearest Neighbor (K-NN) algorithm with Oversampling technique achieves the highest performance in predicting insurance claims, with an accuracy of 90.46%, recall of Yes class at 99.03%, recall of No class at 81.88%, precision of Yes class at 84.53%, precision of No class at 98.83%, and an AUC of 0.984. Furthermore, the evaluation and visualization of performance comparisons indicate that the Random Forest (RF) algorithm with Oversampling technique and the K-Nearest Neighbors (K-NN) algorithm with Oversampling technique exhibit the most promising results in predicting car insurance claims.

Downloads

Download data is not yet available.

References

A. A. Firdaus and A. Komarudin, “Klasifikasi Pemegang Polis Menggunakan Metode XGBoost,” Pros. Stat., vol. 7, no. 2, pp. 704–710, 2021, [Online]. Available: http://dx.doi.org/10.29313/.v0i0.30320

R. Ramlah, “Penerapan Ganti Rugi Asuransi Mobil Pada Kasus Kecelakaan Dan Pencurian PT. Asuransi Tri Pakarta,” Optim. J. Ekon. dan Manaj., vol. 2, no. 2, pp. 223–232, 2022, doi: 10.55606/optimal.v2i2.643.

I. I. Information, “Facts + Statistics: Auto insurance.” [Online]. Available: https://www.iii.org/fact-statistic/facts-statistics-auto-insurance

F. D. Astuti and F. N. Lenti, “Implementasi SMOTE untuk mengatasi Imbalance Class pada Klasifikasi Car Evolution menggunakan K-NN,” JUPITER (Jurnal Penelit. Ilmu dan Teknol. Komputer), vol. 13, no. 1, pp. 89–98, 2021.

S. Makki, Z. Assaghir, Y. Taher, R. Haque, M. S. Hacid, and H. Zeineddine, “An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection,” IEEE Access, vol. 7, pp. 93010–93022, 2019, doi: 10.1109/ACCESS.2019.2927266.

A. P. Ayudhitama and U. Pujianto, “JIP (Jurnal Informatika Polinema) ANALISA 4 ALGORITMA DALAM KLASIFIKASI PENYAKIT LIVER MENGGUNAKAN RAPIDMINER”.

A. I. Alrais, “Fraudulent Insurance Claims Detection Using Machine Learning by A Capstone Submitted in Partial Fulfilment of the Requirements for the,” 2022.

I. Kurniawan, D. C. P. Buani, A. Abdussomad, W. Apriliah, and E. Fitriani, “Penerapan Teknik Random Undersampling untuk Mengatasi Imbalance Class dalam Prediksi Kebakaran Hutan Menggunakan Algoritma Decision Tree,” Acad. J. Comput. Sci. Res., vol. 5, no. 1, p. 1, 2023, doi: 10.38101/ajcsr.v5i1.617.

A. Syukron and A. Subekti, “Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit,” J. Inform., vol. 5, no. 2, pp. 175–185, 2018, doi: 10.31311/ji.v5i2.4158.

L. Fadilah, Klasifikasi Random Forest pada Data Imbalanced Program Studi Matematika Universitas Islam Negeri Syarif Hidayatullah 2018 / 1439 H Klasifikasi Random Forest. 2018.

D. P. Utomo and M. Mesran, “Analisis Komparasi Metode Klasifikasi Data Mining dan Reduksi Atribut Pada Data Set Penyakit Jantung,” J. MEDIA Inform. BUDIDARMA, vol. 4, no. 2, p. 437, Apr. 2020, doi: 10.30865/mib.v4i2.2080.

F. Gorunescu, Data mining: concepts, models and techniques, vol. 21, no. 1. 2011. [Online]. Available: http://journal.um-surabaya.ac.id/index.php/JKM/article/view/2203

A. Saputra and Suharjito, “Fraud detection using machine learning in e-commerce,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 9, pp. 332–339, 2019, doi: 10.14569/ijacsa.2019.0100943.

P. Agustia Rahayuningsih, R. Maulana, P. Studi Komputerisasi Akuntansi, A. BSI Pontianak, and J. Abdurahman Saleh, “Analisis Perbandingan Algoritma Klasifikasi Data Mining Untuk Dataset Blogger Dengan Rapid Miner,” vol. VI, no. 1, 2018.

D. Y. Mohammed, “Detection of Vehicle Insurance Claim Fraud: A Fraud Detection Use-Case for the Vehicle Insurance Industry,” Int. J. Progress. Sci. …, vol. 30, no. March, pp. 504–507, 2021, [Online]. Available: http://ijpsat.es/index.php/ijpsat/article/view/3919%0Ahttp://ijpsat.es/index.php/ijpsat/article/download/3919/2405

F. D. Pramakrisna, F. D. Adhinata, and N. A. F. Tanjung, “Aplikasi Klasifikasi SMS Berbasis Web Menggunakan Algoritma Logistic Regression,” Teknika, vol. 11, no. 2, pp. 90–97, 2022, doi: 10.34148/teknika.v11i2.466.

S. Uddin, A. Khan, M. E. Hossain, and M. A. Moni, “Comparing different supervised machine learning algorithms for disease prediction,” BMC Med. Inform. Decis. Mak., vol. 19, no. 1, Dec. 2019, doi: 10.1186/s12911-019-1004-8.

J. P. Li, A. U. Haq, S. U. Din, J. Khan, A. Khan, and A. Saboor, “Heart Disease Identification Method Using Machine Learning Classification in E-Healthcare,” IEEE Access, vol. 8, no. Ml, pp. 107562–107582, 2020, doi: 10.1109/ACCESS.2020.3001149.

A. Maghfiroh, Y. Findawati, and U. Indahyanti, “Klasifikasi Penipuan pada Rekening Bank menggunakan Pendekatan Ensemble Learning,” vol. 4, no. 4, pp. 1883–1891, 2023, doi: 10.47065/bits.v4i4.3212.

D. A. Kristiyanti, “Analisis sentimen review produk kosmetik melalui komparasi feature selection,” Konf. Nas. ilmu Pengetah. dan Teknol., vol. 2, no. 2, pp. 74–81, 2015.

R. Indrayani, “ANALISA PERBANDINGAN ALGORITME NAÏVE BAYES DAN DECISION TREE PADA KLASIFIKASI DATA TRANSFUSI DARAH,” 2018.

D. Retno Utari and A. Wibowo, “Pemodelan Prediksi Status Keberlanjutan Polis Asuransi Kendaraan dengan Teknik Pemilihan Mayoritas Menggunakan Algoritma-Algoritma Klasifikasi Data Mining,” Pros. Semin. Nas. Teknoka, vol. 5, no. 2502, pp. 19–24, 2020, doi: 10.22236/teknoka.v5i.391.

R. N. Devita, H. W. Herwanto, and A. P. Wibawa, “Perbandingan Kinerja Metode Naive Bayes dan K-Nearest Neighbor untuk Klasifikasi Artikel Berbahasa indonesia,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 4, p. 427, Oct. 2018, doi: 10.25126/jtiik.201854773.

S. Panigrahi and B. Palkar, “Comparative Analysis on Classification Algorithms of Auto-Insurance Fraud Detection based on Feature Selection Algorithms,” Int. J. Comput. Sci. Eng., vol. 6, no. 9, pp. 72–77, 2018, doi: 10.26438/ijcse/v6i9.7277.

A. Rohman and A. Rufiyanto, “Implementasi Data Mining Dengan Algoritma Decision Tree C4 . 5 Untuk Prediksi Kelulusan Mahasiswa Di Universitas Pandaran,” Proceeding SINTAK 2019, pp. 134–139, 2019.

A. I. Lubis, U. Erdiansyah, and R. Siregar, “Komparasi Akurasi pada Naive Bayes dan Random Forest dalam Klasifikasi Penyakit Liver,” J. Comput. Eng. Syst. Sci., vol. 7, no. 1, pp. 81–89, 2022.

J. Alzubi, A. Nayyar, and A. Kumar, “Machine Learning from Theory to Algorithms: An Overview,” J. Phys. Conf. Ser., vol. 1142, no. 1, 2018, doi: 10.1088/1742-6596/1142/1/012012.

F. Handayani et al., “JEPIN (Jurnal Edukasi dan Penelitian Informatika) Komparasi Support Vector Machine, Logistic Regression Dan Artificial Neural Network dalam Prediksi Penyakit Jantung,” vol. 7, no. 3, pp. 329–334, 2021.

Y. A. Suwitono and F. J. Kaunang, “Implementasi Algoritma Convolutional Neural Network (CNN) Untuk Klasifikasi Daun Dengan Metode Data Mining SEMMA Menggunakan Keras,” J. Komtika (Komputasi dan Inform., vol. 6, no. 2, pp. 109–121, 2022, doi: 10.31603/komtika.v6i2.8054.

Y. I. Kurniawan, “Perbandingan Algoritma Naive Bayes dan C.45 dalam Klasifikasi Data Mining,” J. Teknol. Inf. dan Ilmu Komput., vol. 5, no. 4, p. 455, Oct. 2018, doi: 10.25126/jtiik.201854803.

Downloads

Published

12.06.2024

How to Cite

Gerry Geraldo German. (2024). Machine Learning Algorithm Comparison using Sampling Techniques for Car Insurance Claim Classification. International Journal of Intelligent Systems and Applications in Engineering, 12(4), 2294 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6616

Issue

Section

Research Article