A Comprehensive Review on Diabetes Disease Prediction and Risk Modeling Using Machine Learning Algorithms

Authors

  • Abdul Aamir Khan, B. K. Sharma

Keywords:

Clinical Data, Diabetes Prediction, Healthcare, Logistic Regression, Machine Learning.

Abstract

The increasing global prevalence of diabetes has intensified the demand for accurate and early diagnostic systems. Diabetes is a worldwide health issue that necessitates precise prediction techniques. This study examines research that uses clinical data and machine learning approaches to predict diabetes. Common preprocessing procedures include encoding categorical data, resolving missing values, and normalization. To improve model performance, dimensionality reduction techniques such as Principal Component Analysis (PCA) and feature selection are employed. Metrics like accuracy, precision, recall, F1-score, and AUC-ROC are used to compare supervised learning algorithms like Random Forest, Support Vector Machines (SVM), k-Nearest Neighbours (k-NN), Logistic Regression, and Decision Trees. Many research employ small datasets, which affects generalizability even while accuracy is excellent. The study emphasizes the necessity for diversified datasets and therapeutically relevant, interpretable models while highlighting shortcomings in model interpretability and validation.

Downloads

Download data is not yet available.

References

C.-Y. Chou, D.-Y. Hsu, and C.-H. Chou, “Predicting the onset of diabetes with machine learning methods,” Journal of Personalized Medicine, vol. 13, no. 3, p. 406, Feb. 2023, doi: 10.3390/jpm13030406.

D. Kalla, N. Smith, F. Samaah, and K. Polimetla, “Enhancing early diagnosis: Machine learning applications in diabetes prediction,” Journal of Artificial Intelligence & Cloud Computing, pp. 1–7, Mar. 2022, doi: 10.47363/jaicc/2022(1)191.

Y. Qin et al., “Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type,” International Journal of Environmental Research and Public Health, vol. 19, no. 22, p. 15027, Nov. 2022, doi: 10.3390/ijerph192215027.

M. Talebi Moghaddam et al., “Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm,” BMC Medical Research Methodology, vol. 24, no. 1, Sep. 2024, doi: 10.1186/s12874-024-02341-z.

K. Abnoosian and R. Farnoosh, “Prediction of Diabetes Disease Using an Ensemble of Machine Learning Multi-Classifier Models,” SSRN Electronic Journal, 2022, doi: 10.2139/ssrn.4179050.

P. Alagumariappan et al., “Optimized hybrid machine learning framework for early diabetes prediction using electrogastrograms,” Scientific Reports, vol. 15, no. 1, Mar. 2025, doi: 10.1038/s41598-025-93495-3.

N. E. Costea, E. V. Moisi, and D. E. Popescu, “Comparison of Machine Learning Algorithms for Prediction of Diabetes,” in 2021 16th International Conference on Engineering of Modern Electric Systems (EMES), IEEE, Jun. 2021, pp. 1–4. Accessed: Apr. 23, 2025. [Online]. Available: https://doi.org/10.1109/emes52337.2021.9484116

Tasin, T. U. Nabil, S. Islam, and R. Khan, “Diabetes prediction using machine learning and explainable AI techniques,” Healthcare Technology Letters, vol. 10, no. 1–2, pp. 1–10, Dec. 2022, doi: 10.1049/htl2.12039.

S. A. Rachakonda, S. Pudipedi, and T. S. S. Angel, “PREDICTIVE MODELLING FOR DIABETES USING MACHINE LEARNING,” INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT, vol. 08, no. 008, pp. 1–16, Aug. 2024, doi: 10.55041/ijsrem37149.

K. Kumar and A. Tomar, “Diabetes Prediction System Using Machine Learning,” in 2023 International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT), IEEE, Nov. 2023, pp. 286–291. Accessed: Apr. 23, 2025. [Online]. Available: https://doi.org/10.1109/icaiccit60255.2023.10466034

B. S. Ahamed, M. S. Arya, S. K. B. Sangeetha, and N. V. Auxilia Osvin, “Diabetes Mellitus Disease Prediction and Type Classification Involving Predictive Modeling Using Machine Learning Techniques and Classifiers,” Applied Computational Intelligence and Soft Computing, vol. 2022, pp. 1–11, Dec. 2022, doi: 10.1155/2022/7899364.

S. M. Kuriakose, P. Basa Pati, and T. Singh, “Prediction of Diabetes Using Machine Learning: Analysis of 70,000 Clinical Database Patient Record,” in 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), IEEE, Oct. 2022, pp. 1–5. Accessed: Apr. 23, 2025. [Online]. Available: https://doi.org/10.1109/icccnt54827.2022.9984264

R. Krishnamoorthi et al., “A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques,” Journal of Healthcare Engineering, vol. 2022, pp. 1–10, Jan. 2022, doi: 10.1155/2022/1684017.

V.Yamuna, “V.Yamuna, D.Ushanthi, Chaitanya, B., sri, Y., & T.Jagadish (2022). Diabetes Disease Prediction By Using Machine Learning Algorithms.,” Semanticscholar, 2022.

C. Lyngdoh, N. A. Choudhury, and S. Moulik, “Diabetes Disease Prediction Using Machine Learning Algorithms,” in 2020 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), IEEE, Mar. 2021, pp. 517–521. Accessed: Apr. 23, 2025. [Online]. Available: https://doi.org/10.1109/iecbes48179.2021.9398759

H. Kaur and V. Kumari, “Predictive modelling and analytics for diabetes using a machine learning approach,” Applied Computing and Informatics, vol. 18, no. 1/2, pp. 90–100, Jul. 2020, doi: 10.1016/j.aci.2018.12.004.

Sharma, A., “Prediction of Diabetes Disease Using Machine Learning Model.,” Semanticscholar, 2021.

J. Kaliappan et al., “Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets,” Frontiers in Artificial Intelligence, vol. 7, Aug. 2024, doi: 10.3389/frai.2024.1421751.

Abousaber, H. F. Abdallah, and H. El-Ghaish, “Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets,” Frontiers in Artificial Intelligence, vol. 7, Jan. 2025, doi: 10.3389/frai.2024.1499530.

N. Halder, “Exploring the Pima Indians Diabetes Dataset: Advanced Data Analysis Techniques in Python,” Medium, Jan. 03, 2024. Accessed: Apr. 23, 2025. [Online]. Available: https://medium.com/@HalderNilimesh/exploring-the-pima-indians-diabetes-dataset-advanced-data-analysis-techniques-in-python-f02cba6f9f35

S. Erzurumlu, “Optimizing Healthcare Predictions with CatBoost: A Study on the Pima Indians Diabetes Dataset,” LinkedIn, Sep. 16, 2024. Accessed: Apr. 23, 2025. [Online]. Available: https://www.linkedin.com/pulse/optimizing-healthcare-predictions-catboost-study-pima-erzurumlu-ipuvf/

Downloads

Published

20.12.2023

How to Cite

Abdul Aamir Khan. (2023). A Comprehensive Review on Diabetes Disease Prediction and Risk Modeling Using Machine Learning Algorithms . International Journal of Intelligent Systems and Applications in Engineering, 11(11s), 997–1007. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/8004

Issue

Section

Research Article