Holistic Integration of Clustering Algorithms for Improved Diabetes Prediction

Authors

  • Rita Ganguly, Dharmpal Singh, Rajesh Bose

Keywords:

Diabetes detection, Machine learning, K-Means, Ensemble learning, Healthcare.

Abstract

This research paper presents a ground-breaking method aimed at refining the precision and dependability of diabetes risk prediction models. By merging K-Means, Fuzzy C-Means (FCM), and Hierarchical Clustering algorithms, the study tackles the intricacies of diabetes-related data within the PIMA Indian Diabetes dataset. Extensive data pre-processing, encompassing tasks such as managing missing values and standardizing features, lays the foundation for this integrated clustering approach. Through meticulous evaluation, which includes comparisons with individual clustering methods and conventional prediction models, the study demonstrates a notable enhancement in accuracy and resilience. These outcomes highlight the importance of amalgamating diverse clustering techniques in healthcare analytics, fostering a nuanced comprehension of patient data and enabling early detection of diabetes risk. The research underscores the significance of adopting a comprehensive clustering strategy to advance predictive modelling in diabetes risk assessment, offering valuable insights to the field.

Downloads

Download data is not yet available.

References

World Health Organization: diabetes (2021). https://www.who.int/news-room/fact-sheets/detail/diabetes. Accessed 10 Nov 2021.

World Health Organization: the-top-10-causes-of-death (2020). https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death.Accessed 09 Dec 2020.

World Health Organization: diabetes (2019). https://www.diabetesatlas.org/en/sections/worldwide-toll-of-diabetes.html. Accessed 02 Feb 2019.

Wei L, Wan S, Guo J, Wong KK. A novel hierarchical selective ensemble classifier with bioinformatics application. Artif Intell Med. 2017;83:82–90. doi: 10.1016/j.artmed.2017.02.005. [PubMed] [CrossRef] [Google Scholar]

Chen C, Zhang Q, Yu B, Yu Z, Lawrence PJ, Ma Q, Zhang Y. Improving protein-protein interactions prediction accuracy using xgboost feature selection and stacked ensemble classifier. Comput Biol Med. 2020;123:103899. doi: 10.1016/j.compbiomed.2020.103899. [PubMed] [CrossRef] [Google Scholar]

Nalic J, Martinovic G, Zagar D. New hybrid data mining model for credit scoring based on feature selection algorithm and ensemble classifiers. Adv Eng Inform. 2020;45:101130. doi: 10.1016/j.aei.2020.101130. [CrossRef] [Google Scholar]

Yakkundimath R, Jadhav V, Anami B, Malvade N. Co-occurrence histogram based ensemble of classifiers for classification of cervical cancer cells. J Electron Sci Technol. 2022;20(3):100170. doi: 10.1016/j.jnlest.2022.100170. [CrossRef] [Google Scholar]

Nguyen TT, Nguyen TTT, Pham XC, Liew AW-C. A novel combining classifier method based on variational inference. Pattern Recogn. 2016;49:198–212. doi: 10.1016/j.patcog.2015.06.016. [CrossRef] [Google Scholar]

Chen H, Tan C, Lin Z, Wu T. The diagnostics of diabetes mellitus based on ensemble modeling and hair/urine element level analysis. Comput Biol Med. 2014;50:70–75. doi: 10.1016/j.compbiomed.2014.04.012. [PubMed] [CrossRef] [Google Scholar]

Sajida P, Muhammad S, Azi ZG, Karim K. Performance analysis of data mining classification techniques to predict diabetes. Procedia Comput Sci. 2016;82:115–121. doi: 10.1016/j.procs.2016.04.016. [CrossRef] [Google Scholar]

Wu H, Yang S, Huang Z, He J, Wang X. Type 2 diabetes mellitus prediction model based on data mining. Inform Med Unlocked. 2018;10:100–107. doi: 10.1016/j.imu.2017.12.006. [CrossRef] [Google Scholar]

Changsheng Z, Christian UI, Wenfang F. Improved logistic regression model for diabetes prediction by integrating pca and k-means techniques. Inform Med Unlocked 17 (2019)

Lukmanto RB, Suharjito S, Nugroho A, Akbar H. Early detection of diabetes mellitus using feature selection and fuzzy support vector machine. Procedia Comput Sci. 2019;157:46–54. doi: 10.1016/j.procs.2019.08.140. [CrossRef] [Google Scholar]

Siva SG, Manikandan K. Diagnosis of diabetes diseases using optimized fuzzy rule set by grey wolf optimization. Pattern Recogn Lett. 2019;125:432–438. doi: 10.1016/j.patrec.2019.06.005. [CrossRef] [Google Scholar]

Raja JB, Pandian SC. Pso-fcm based data mining model to predict diabetic disease. Comput Methods Prog Biomed. 196 (2020). [PubMed]

Devi RDH, Bai A, Nagarajan N. A novel hybrid approach for diagnosing diabetes mellitus using farthest first and support vector machine algorithms. Obes Med. 17 (2020).

Kumari S, Kumar D, Mittal M. An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cogn Comput Eng. 2021;2:40–46. [Google Scholar]

Khanam JJ, Foo SY. A comparison of machine learning algorithms for diabetes prediction. ICT Express. 2021;7:432–439. doi: 10.1016/j.icte.2021.02.004. [CrossRef] [Google Scholar]

Rajendra P, Latifi S. Prediction of diabetes using logistic regression and ensemble techniques. Comput Methods Prog Biomed Update. 2021;1:100032.doi: 10.1016/j.cmpbup.2021.100032. [CrossRef] [Google Scholar]

Rawat V, Joshi S, Gupta S, Singh DP, Singh N. Machine learning algorithms for early diagnosis of diabetes mellitus: a comparative study. Mater Today: Proc. 2022;56:502–506. [Google Scholar]

Su Y, Huang C, Zhu W, Lyu X, Ji F. Multi-party diabetes mellitus risk prediction based on secure federated learning. Biomed Signal Process Control. 2023;85:104881. doi: 10.1016/j.bspc.2023.104881. [CrossRef] [Google Scholar]

Kannadasan K, Edla DR, Kuppili V. Type 2 diabetes data classification using stacked autoencoders in deep neural networks. Clin Epidemiol Glob Health. 2019;7:530–535. doi: 10.1016/j.cegh.2018.12.004. [CrossRef] [Google Scholar]

Nguyen BP, Pham HN, Tran H, Nghiem N, Nguyen QH, Do TTT, Tran CT, Simpson CR. Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records. Comput Methods Programs Biomed. 2019;182:105055. doi: 10.1016/j.cmpb.2019.105055. [PubMed] [CrossRef] [Google Scholar]

Motiur R, Dilshad I, Rokeya JM, Indrajit S. A deep learning approach based on convolutional lstm for detecting diabetes. Comput Biol Chem. 88 (2020) [PubMed]

P, B.M.K., R, S.P., R K, N., K, A.: Type 2: Diabetes mellitus prediction using deep neural networks classifier. International Journal of Cognitive Computing in Engineering 1, 55–61 (2020)

Garc´ıa-Ordas, M.T., Benavides, C., Benıtez-Andrades, J.A., Alaiz-Moreton, H., Garcıa-Rodr´ıguez, I.: Diabetes detection using deep learning techniques with oversampling and feature augmentation. Computer Methods and Programs in Biomedicine 202 (2021). [PubMed]

Kalagotla SK, Gangashetty SV, Giridhar K. A novel stacking technique for prediction of diabetes. Comput Biol Med. 2021;135:104554. doi: 10.1016/j.compbiomed.2021.104554. [PubMed] [CrossRef] [Google Scholar]

Rajagopal A, Jha S, Alagarsamy R, Quek SG, Selvachandran G. A novel hybrid machine learning framework for the prediction of diabetes with context-customized regularization and prediction procedures. Math Comput Simul. 2022;198:388–406. doi: 10.1016/j.matcom.2022.03.003. [CrossRef] [Google Scholar]

Wu Y, Zhang Q, Hu Y, Sun-Woo K, Zhang X, Zhu H, Jie L, Li S. Novel binary logistic regression model based on feature transformation of xgboost for type 2 diabetes mellitus prediction in healthcare systems. Future Generat Comput Syst. 2022;129:1–12. doi: 10.1016/j.future.2021.11.003. [CrossRef] [Google Scholar]

Roobini MS, Lakshmi M. Autonomous prediction of type 2 diabetes with high impact of glucose level. Comput Electr Eng. 2022;101:108082. doi: 10.1016/j.compeleceng.2022.108082. [CrossRef] [Google Scholar]

Rabhi S, Blanchard F, Diallo AM, Zeghlache D, Lukas C, Berot A, Delemer B, Barraud S. Temporal deep learning framework for retinopathy prediction in patients with type 1 diabetes. Artif Intell Med. 2022;133:102408. doi: 10.1016/j.artmed.2022.102408. [PubMed] [CrossRef] [Google Scholar]

Qi H, Song X, Liu S, Zhang Y, Wong KKL. Kfpredict: an ensemble learning prediction framework for diabetes based on fusion of key features. Comput Methods Programs Biomed. 2023;231:107378. doi: 10.1016/j.cmpb.2023.107378. [PubMed] [CrossRef] [Google Scholar]

Kursa MB, Rudnicki WR. Feature selection with the boruta package. J Stat Softw. 2010;36:1–13. doi: 10.18637/jss.v036.i11. [CrossRef] [Google Scholar]

David Arthur and Sergei Vassilvitskii: k-Means++: The Advantages of Careful Seeding (2006). http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf.

WEKA: WEKA (2019). https://baike.baidu.com/item/kappa.

Reddy, J., Mounika, B., Sindhu, S., Reddy, T.P., Reddy, N.S., Sri, G.J., Swaraja, K., Meenakshi, K., Kora, P.: Predictive machine learning model for early detection and analysis of diabetes. In: Predictive Machine Learning Model for Early Detection and Analysis of diabetes,Materials Today: Proceedings,2020. (2020).

Vigneswari, D., Kumar, N.K., Raj, V.G., Gugan, A., Vikash, S.R.: Machine learning tree classifiers in predicting diabetes mellitus. In: 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS), 2019, Pp., pp. 84–87 (2019).

Raj RS, Kusuma DSS, M., Sampath, S.: Comparison of support vector machine and na¨ıve bayes classifiers for predicting diabetes. In: 2019 1st International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE), 2019, Pp., pp. 41–45 (2019).

Pal R, Sen JPM.:Application of machine learning algorithms on diabetic retinopathy. In: 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), 2017, pp. 2046–2051.

Santhanam T, Padmavathi MS. Comparison of k-means clustering and statistical outliers in reducing medical datasets. In: 2014 International Conference on Science Engineering and Management Research (ICSEMR), 2014, pp. 1–6.

Beqiri L, Velinov A, Fetaji B, Loku L, Bucuku A, Zdravev Z. Analysis of diabetes dataset. In: 2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), 2020 pp. 309–314 (2020).

Downloads

Published

24.03.2024

How to Cite

Rita Ganguly. (2024). Holistic Integration of Clustering Algorithms for Improved Diabetes Prediction. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3200–3206. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/5925

Issue

Section

Research Article