Performance Analysis of Chronic Kidney Disease Detection Based on K-Nearest Neighbors Data Mining

Authors

  • Mohtady Ehab Barakat Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia.
  • Chung Gwo Chin Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia.
  • Lee It Ee Faculty of Engineering, Multimedia University, 63100 Cyberjaya, Malaysia.

Keywords:

Chronic kidney disease, data mining, K-Nearest Neighbors, linear regression, decision tree

Abstract

Kidney diseases are a leading cause of death in the United States. According to the Centers for Disease Control and Prevention (CDC), in 2021, approximately 37 million US adults, or 1 in 7, are estimated to have chronic kidney disease (CKD), and most are undiagnosed. Moreover, Medicare costs for people with CKD were $87.2 billion in 2019. Thus, data mining has been used in the healthcare industry to assist authorities in providing patients with health information as well as identifying patients earlier. In this paper, data mining is implemented for the classification of laboratory data from CKD patients. The K-Nearest Neighbors (KNN) algorithm is proposed to train the machine learning model to detect CKD based on blood test lab results such as sugar count, white blood cell count, red blood cell count, hemoglobin, albumin, etc. The model also includes general factors such as age and blood pressure. From the obtained results, other machine learning methods produce inferior accuracy, such as linear regression and decision tree. By training the model on a dataset containing 400 different anonymous patients using KNN, the accuracy reaches 99%. Based on the prediction, around 40% of the patients are fully healthy. This paper aims to detect whether the patient has CKD or not, depending on lab results and general information about the patient.

Downloads

Download data is not yet available.

References

T. K. Chen, D. H. Knicely, D. H. and M. E. Grams, “Chronic kidney disease diagnosis and management,” The Journal of the American Medical Association (JAMA), vol. 322, no. 13, pp. 1294, 2019. https://doi.org/ 10.1001/jama.2019.14745

C. P. Kovesdy, “Epidemiology of chronic kidney disease: an update 2022,” Kidney International Supplements, vol. 12, no. 1, pp. 7-11, 2022. https://doi.org/10.1016/j.kisu.2021.11.003

T. Calders and B. Custers, “What is data mining and how does it work?,” Studies in Applied Philosophy, Epistemology and Rational Ethics, pp. 27–42, 2013. https://doi.org/10.1007/978-3-642-30487-3_2

M. L. Kolling, L. B. Furstenau, M. K. Sott, B. Rabaioli, OP. H. Ulmi, N. L. Bragazzi and L. P. Tedesco, “Data mining in healthcare: Applying strategic intelligence techniques to depict 25 years of research development,” International Journal of Environmental Research and Public Health, vol. 18, no. 6, pp. 3099, 2021. https://doi.org/10.3390/ijerph18063099

A. Garg and V. Mago, “Role of machine learning in medical research: A survey,” Computer Science Review, vol. 40, pp. 100370, 2021. https://doi.org/10.1016/j.cosrev.2021.100370

P. Sinha and P. Sinha, “Comparative study of chronic kidney disease prediction using KNN and SVM,” International Journal of Engineering Research and Technology (IJERT), vol. 4, no. 12, 2015. https://doi.org/ 10.17577/IJERTV4IS120622

P. Tikariha and P. Richhariya, “Comparative study of chronic kidney disease prediction using different classification techniques,” presented at the Proceedings of International Conference on Recent Advancement on Computer and Communication (ICRAC), pp. 195-203), Springer Singapore, 2018. https://doi.org/10.1007/978-981-10-8198-9_20

E.-H. A. Rady and A. S. Anwar, “Prediction of kidney disease stages using data mining algorithms,” Informatics in Medicine Unlocked, vol. 15, pp. 100178, 2019. https://doi.org/ 10.1016/j.imu.2019.100178

A. AhmedK, S. Aljahdali and S. Naimatullah Hussain, “Comparative prediction performance with support vector machine and random forest classification techniques,” International Journal of Computer Applications, vol. 69, no. 11, pp. 12–16, 2013. https://doi.org/10.5120/11885-7922

R. Subhashini, M. Jeyakumar and N. Islam, “OF-KNN technique: An approach for chronic kidney disease prediction,” Computer Science, vol. 116, no. 24, 2017.

V. Manoranjithem and M. Venkatesulu, “KNN classification in chronic kidney disease dataset, International Journal of Mathematics and Computer Science (IJMCS), vol. 15, no. 4, pp. 1337–1343, 2020.

C. Priyadharshini, K. Sanjeev, M. Vignesh, N. Saravanan and M. Somu, “KNN based detection and diagnosis of chronic kidney disease,” Annals of the Romanian Society for Cell Biology, vol. 25, no. 4, pp. 2870, 2021.

S. Suthaharan, Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning, Springer, 2015.

A. Schneider, G. Hommel and M. Blettner, “Linear regression analysis,” Deutsches Ärzteblatt International, vol. 107, no. 44, pp. 776-782, 2010. https://doi.org/10.3238/arztebl.2010.0776

I. Jenhani, N. B. Amor and Z. Elouedi, “Decision trees as possibilistic classifiers,” International Journal of Approximate Reasoning, vol. 48, no. 3, pp. 784–807, 2008. https://doi.org/10.1016/j.ijar.2007.12.002

Z. Zhang, “Introduction to machine learning: K-Nearest Neighbors,” Annals of Translational Medicine, vol. 4, no. 11, pp. 218–218, 2016. https://doi.org/10.21037/atm.2016.03.37

H. Rajaguru and S. K. Prabhakar, “KNN Classifier and K-Means Clustering for Robust Classification of Epilepsy from EEG Signals. A Detailed Analysis,” Anchor Academic Publishing, 2017.

Y. Jung and J. Hu, “A K-fold averaging cross-validation procedure,” Journal of Nonparametric Statistics, vol. 27, no. 2, pp. 167–179, 2015. https://doi.org/10.1080/10485252.2015.1010532

R. Blagus and L. Lusa, “SMOTE for high-dimensional class-imbalanced data,” BMC Bioinformatics 14, no. 106, 2013. https://doi.org/ 10.1186/1471-2105-14-106

G. S. K. G. Prasad, A. A. Chowdari, K. P. Jona and R. Senapati, “Detection of CKD from CT Scan images using KNN algorithm and using edge detection,” presented at the 2022 2nd International Conference on Emerging Frontiers in Electrical and Electronic Technologies (ICEFEET), pp. 1-4, 2022. https://doi.org/10.1109/icefeet51821.2022.9848173

M, T. ., & K, P. . (2023). An Enhanced Expectation Maximization Text Document Clustering Algorithm for E-Content Analysis. International Journal on Recent and Innovation Trends in Computing and Communication, 11(1), 12–19. https://doi.org/10.17762/ijritcc.v11i1.5982

Dr. Bhushan Bandre. (2013). Design and Analysis of Low Power Energy Efficient Braun Multiplier. International Journal of New Practices in Management and Engineering, 2(01), 08 - 16. Retrieved from http://ijnpme.org/index.php/IJNPME/article/view/12

Downloads

Published

11.07.2023

How to Cite

Barakat, M. E. ., Chin, C. G. ., & Ee, L. I. . (2023). Performance Analysis of Chronic Kidney Disease Detection Based on K-Nearest Neighbors Data Mining. International Journal of Intelligent Systems and Applications in Engineering, 11(8s), 393–400. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3065