A Comprehensive Review on Diabetes Disease Prediction and Risk Modeling Using Machine Learning Algorithms
Keywords:
Clinical Data, Diabetes Prediction, Healthcare, Logistic Regression, Machine Learning.Abstract
The increasing global prevalence of diabetes has intensified the demand for accurate and early diagnostic systems. Diabetes is a worldwide health issue that necessitates precise prediction techniques. This study examines research that uses clinical data and machine learning approaches to predict diabetes. Common preprocessing procedures include encoding categorical data, resolving missing values, and normalization. To improve model performance, dimensionality reduction techniques such as Principal Component Analysis (PCA) and feature selection are employed. Metrics like accuracy, precision, recall, F1-score, and AUC-ROC are used to compare supervised learning algorithms like Random Forest, Support Vector Machines (SVM), k-Nearest Neighbours (k-NN), Logistic Regression, and Decision Trees. Many research employ small datasets, which affects generalizability even while accuracy is excellent. The study emphasizes the necessity for diversified datasets and therapeutically relevant, interpretable models while highlighting shortcomings in model interpretability and validation.
Downloads
References
C.-Y. Chou, D.-Y. Hsu, and C.-H. Chou, “Predicting the onset of diabetes with machine learning methods,” Journal of Personalized Medicine, vol. 13, no. 3, p. 406, Feb. 2023, doi: 10.3390/jpm13030406.
D. Kalla, N. Smith, F. Samaah, and K. Polimetla, “Enhancing early diagnosis: Machine learning applications in diabetes prediction,” Journal of Artificial Intelligence & Cloud Computing, pp. 1–7, Mar. 2022, doi: 10.47363/jaicc/2022(1)191.
Y. Qin et al., “Machine Learning Models for Data-Driven Prediction of Diabetes by Lifestyle Type,” International Journal of Environmental Research and Public Health, vol. 19, no. 22, p. 15027, Nov. 2022, doi: 10.3390/ijerph192215027.
M. Talebi Moghaddam et al., “Predicting diabetes in adults: identifying important features in unbalanced data over a 5-year cohort study using machine learning algorithm,” BMC Medical Research Methodology, vol. 24, no. 1, Sep. 2024, doi: 10.1186/s12874-024-02341-z.
K. Abnoosian and R. Farnoosh, “Prediction of Diabetes Disease Using an Ensemble of Machine Learning Multi-Classifier Models,” SSRN Electronic Journal, 2022, doi: 10.2139/ssrn.4179050.
P. Alagumariappan et al., “Optimized hybrid machine learning framework for early diabetes prediction using electrogastrograms,” Scientific Reports, vol. 15, no. 1, Mar. 2025, doi: 10.1038/s41598-025-93495-3.
N. E. Costea, E. V. Moisi, and D. E. Popescu, “Comparison of Machine Learning Algorithms for Prediction of Diabetes,” in 2021 16th International Conference on Engineering of Modern Electric Systems (EMES), IEEE, Jun. 2021, pp. 1–4. Accessed: Apr. 23, 2025. [Online]. Available: https://doi.org/10.1109/emes52337.2021.9484116
Tasin, T. U. Nabil, S. Islam, and R. Khan, “Diabetes prediction using machine learning and explainable AI techniques,” Healthcare Technology Letters, vol. 10, no. 1–2, pp. 1–10, Dec. 2022, doi: 10.1049/htl2.12039.
S. A. Rachakonda, S. Pudipedi, and T. S. S. Angel, “PREDICTIVE MODELLING FOR DIABETES USING MACHINE LEARNING,” INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT, vol. 08, no. 008, pp. 1–16, Aug. 2024, doi: 10.55041/ijsrem37149.
K. Kumar and A. Tomar, “Diabetes Prediction System Using Machine Learning,” in 2023 International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT), IEEE, Nov. 2023, pp. 286–291. Accessed: Apr. 23, 2025. [Online]. Available: https://doi.org/10.1109/icaiccit60255.2023.10466034
B. S. Ahamed, M. S. Arya, S. K. B. Sangeetha, and N. V. Auxilia Osvin, “Diabetes Mellitus Disease Prediction and Type Classification Involving Predictive Modeling Using Machine Learning Techniques and Classifiers,” Applied Computational Intelligence and Soft Computing, vol. 2022, pp. 1–11, Dec. 2022, doi: 10.1155/2022/7899364.
S. M. Kuriakose, P. Basa Pati, and T. Singh, “Prediction of Diabetes Using Machine Learning: Analysis of 70,000 Clinical Database Patient Record,” in 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), IEEE, Oct. 2022, pp. 1–5. Accessed: Apr. 23, 2025. [Online]. Available: https://doi.org/10.1109/icccnt54827.2022.9984264
R. Krishnamoorthi et al., “A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques,” Journal of Healthcare Engineering, vol. 2022, pp. 1–10, Jan. 2022, doi: 10.1155/2022/1684017.
V.Yamuna, “V.Yamuna, D.Ushanthi, Chaitanya, B., sri, Y., & T.Jagadish (2022). Diabetes Disease Prediction By Using Machine Learning Algorithms.,” Semanticscholar, 2022.
C. Lyngdoh, N. A. Choudhury, and S. Moulik, “Diabetes Disease Prediction Using Machine Learning Algorithms,” in 2020 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), IEEE, Mar. 2021, pp. 517–521. Accessed: Apr. 23, 2025. [Online]. Available: https://doi.org/10.1109/iecbes48179.2021.9398759
H. Kaur and V. Kumari, “Predictive modelling and analytics for diabetes using a machine learning approach,” Applied Computing and Informatics, vol. 18, no. 1/2, pp. 90–100, Jul. 2020, doi: 10.1016/j.aci.2018.12.004.
Sharma, A., “Prediction of Diabetes Disease Using Machine Learning Model.,” Semanticscholar, 2021.
J. Kaliappan et al., “Analyzing classification and feature selection strategies for diabetes prediction across diverse diabetes datasets,” Frontiers in Artificial Intelligence, vol. 7, Aug. 2024, doi: 10.3389/frai.2024.1421751.
Abousaber, H. F. Abdallah, and H. El-Ghaish, “Robust predictive framework for diabetes classification using optimized machine learning on imbalanced datasets,” Frontiers in Artificial Intelligence, vol. 7, Jan. 2025, doi: 10.3389/frai.2024.1499530.
N. Halder, “Exploring the Pima Indians Diabetes Dataset: Advanced Data Analysis Techniques in Python,” Medium, Jan. 03, 2024. Accessed: Apr. 23, 2025. [Online]. Available: https://medium.com/@HalderNilimesh/exploring-the-pima-indians-diabetes-dataset-advanced-data-analysis-techniques-in-python-f02cba6f9f35
S. Erzurumlu, “Optimizing Healthcare Predictions with CatBoost: A Study on the Pima Indians Diabetes Dataset,” LinkedIn, Sep. 16, 2024. Accessed: Apr. 23, 2025. [Online]. Available: https://www.linkedin.com/pulse/optimizing-healthcare-predictions-catboost-study-pima-erzurumlu-ipuvf/
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.


