An Evaluation of Machine Learning Algorithms and Feature Selection Methods for Cervical Cancer Risk Prediction using Clinical Features

Authors

  • Sokaina El Khamlıchı Laboratory of Innovative Technologies, National School of Applied Sciences of Tangier, Abdelmalek Essaâdi University, Tangier, Morocco https://orcid.org/0000-0001-8848-9661
  • Ikram Ben Abdel Ouahab Computer science, systems and telecommunication laboratory (LIST), Faculty of sciences and techniques, University Abdelmalek Essaadi, Tangier, Morocco https://orcid.org/0000-0003-0955-6382
  • Mohammed Bouhorma Computer science, systems and telecommunication laboratory (LIST), Faculty of sciences and techniques, University Abdelmalek Essaadi, Tangier, Morocco https://orcid.org/0000-0002-5687-5231
  • Fatiha Elouaaı Computer science, systems and telecommunication laboratory (LIST), Faculty of sciences and techniques, University Abdelmalek Essaadi, Tangier, Morocco https://orcid.org/0000-0002-7139-5682
  • Abdelfettah Sedquı Laboratory of Innovative Technologies, National School of Applied Sciences of Tangier, Abdelmalek Essaâdi University, Tangier, Morocco https://orcid.org/0000-0002-7446-0400
  • Amal Maurady Laboratory of Innovative Technologies, National School of Applied Sciences of Tangier, Abdelmalek Essaâdi University, Tangier, Morocco https://orcid.org/0000-0001-9298-717X

Keywords:

Cervical Cancer, Decision Tree, Feature Selection, RFE, SMOTE, Supervised Machine Learning

Abstract

Cervical cancer is one of the most frequent gynecological cancers worldwide. It is associated to several risk factors like sexually transmitted diseases, human papillomavirus and smoking. The early diagnosis of this disease is crucial to lower fatality rates. Furthermore, its early prediction can support clinicians and patients to have an effective treatment. This study intends to compare machine learning classifiers to determine the best model to predict cervical cancer and identify its most significant risk factors. This work compares five machine learning algorithms: K-Nearest Neighbor, Gaussian Naïve Bayes, Logistic Regression, Random Forest and Decision Tree (DT). Afterwards, the study continues to enhance the outcome of DT algorithm through balancing the data with Synthetic Minority Oversampling Technique (SMOTE), selecting the most important features with Recursive Feature Elimination (RFE) and tuning hyperparameters with Grid Search technique. Overall, the combination of Decision Tree classification technique with SMOTE and tuning hyperparameters with Grid Search method presents the most performing model.

Downloads

Download data is not yet available.

References

F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, et A. Jemal, « Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries », CA. Cancer J. Clin., vol. 68, no 6, p. 394 424, nov. 2018, doi: 10.3322/caac.21492.

Yadav, P. ., S. . Kumar, and D. K. J. . Saini. “A Novel Method of Butterfly Optimization Algorithm for Load Balancing in Cloud Computing”. International Journal on Recent and Innovation Trends in Computing and Communication, vol. 10, no. 8, Aug. 2022, pp. 110-5, doi:10.17762/ijritcc.v10i8.5683.

« Cervical cancer ». https://www.who.int/westernpacific/health-topics/cervical-cancer (consulté le 11 avril 2021).

W. Messoudi et al., « Cervical cancer prevention in Morocco: a model-based cost-effectiveness analysis », J. Med. Econ., vol. 22, no 11, p. 1153 1159, nov. 2019, doi: 10.1080/13696998.2019.1624556.

M. Exner et al., « Value of diffusion-weighted MRI in diagnosis of uterine cervical cancer: a prospective study evaluating the benefits of DWI compared to conventional MR sequences in a 3T environment », Acta Radiol., vol. 57, no 7, p. 869 877, juill. 2016, doi: 10.1177/0284185115602146.

P. Z. McVeigh, A. M. Syed, M. Milosevic, A. Fyles, et M. A. Haider, « Diffusion-weighted MRI in cervical cancer », Eur. Radiol., vol. 18, no 5, p. 1058 1064, mai 2008, doi: 10.1007/s00330-007-0843-3.

W. Wu et H. Zhou, « Data-Driven Diagnosis of Cervical Cancer With Support Vector Machine-Based Approaches », IEEE Access, vol. 5, p. 25189 25195, 2017, doi: 10.1109/ACCESS.2017.2763984.

A. Gadducci, C. Barsotti, S. Cosio, L. Domenici, et A. Riccardo Genazzani, « Smoking habit, immune suppression, oral contraceptive use, and hormone replacement therapy use and cervical carcinogenesis: a review of the literature », Gynecol. Endocrinol., vol. 27, no 8, p. 597 604, août 2011, doi: 10.3109/09513590.2011.558953.

P. Luhn et al., « The role of co-factors in the progression from human papillomavirus infection to cervical cancer », Gynecol. Oncol., vol. 128, no 2, p. 265 270, févr. 2013, doi: 10.1016/j.ygyno.2012.11.003.

Harsh, S. ., Singh , D., & Pathak , S. (2022). Efficient and Cost-effective Drone – NDVI system for Precision Farming. International Journal of New Practices in Management and Engineering, 10(04), 14–19. https://doi.org/10.17762/ijnpme.v10i04.126

« Cervical Cancer Prevention (PDQ®)–Health Professional Version - National Cancer Institute », 26 mars 2021. https://www.cancer.gov/types/cervical/hp/cervical-prevention-pdq (consulté le 11 avril 2021).

S. Ganguly et al., « An Adaptive Threshold Based Algorithm for Detection of Red Lesions of Diabetic Retinopathy in a Fundus Image », p. 4, 2014.

A. Agarwal, S. Gulia, S. Chaudhary, M. K. Dutta, C. M. Travieso, et J. B. Alonso-Hernandez, « A novel approach to detect glaucoma in retinal fundus images using cup-disk and rim-disk ratio », in 2015 4th International Work Conference on Bioinspired Intelligence (IWOBI), San Sebastian, Spain, juin 2015, p. 139 144. doi: 10.1109/IWOBI.2015.7160157.

S. S. Han et al., « Keratinocytic Skin Cancer Detection on the Face Using Region-Based Convolutional Neural Network », JAMA Dermatol., vol. 156, no 1, p. 29, janv. 2020, doi: 10.1001/jamadermatol.2019.3807.

S. Graham et al., « Artificial Intelligence for Mental Health and Mental Illnesses: an Overview », Curr. Psychiatry Rep., vol. 21, no 11, p. 116, nov. 2019, doi: 10.1007/s11920-019-1094-0.

F. Christopoulou, T. T. Tran, S. K. Sahu, M. Miwa, et S. Ananiadou, « Adverse drug events and medication relation extraction in electronic health records with ensemble deep learning methods », J. Am. Med. Inform. Assoc., vol. 27, no 1, p. 39 46, janv. 2020, doi: 10.1093/jamia/ocz101.

A. Singh, M. K. Dutta, R. Jennane, et E. Lespessailles, « Classification of the trabecular bone structure of osteoporotic patients using machine vision », Comput. Biol. Med., vol. 91, p. 148 158, déc. 2017, doi: 10.1016/j.compbiomed.2017.10.011.

M. Kolařík, R. Burget, V. Uher, K. Říha, et M. Dutta, « Optimized High Resolution 3D Dense-U-Net Network for Brain and Spine Segmentation », Appl. Sci., vol. 9, no 3, p. 404, janv. 2019, doi: 10.3390/app9030404.

K. Fernandes, J. S. Cardoso, et J. Fernandes, « Transfer Learning with Partial Observability Applied to Cervical Cancer Screening », p. 8.

H. K. Fatlawi, « Enhanced Classification Model for Cervical Cancer Dataset based on Cost Sensitive Classifier », vol. 4, no 4, p. 6, 2017.

Y. M. S. Al-Wesabi, A. Choudhury, et D. Won, « Classification of Cervical Cancer Dataset », p. 6.

S. F. Abdoh, M. Abo Rizka, et F. A. Maghraby, « Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques », IEEE Access, vol. 6, p. 59475 59485, 2018, doi: 10.1109/ACCESS.2018.2874063.

Asadi F, Salehnasab C, et Ajori L, « Supervised Algorithms of Machine Learning for the Prediction of Cervical Cancer », J. Biomed. Phys. Eng., vol. 10, no 4, août 2020, doi: 10.31661/jbpe.v0i0.1912-1027.

M. F. Zorkafli, M. K. Osman, I. S. Isa, F. Ahmad, et S. N. Sulaiman, « Classification of Cervical Cancer Using Hybrid Multi-layered Perceptron Network Trained by Genetic Algorithm », Procedia Comput. Sci., vol. 163, p. 494 501, 2019, doi: 10.1016/j.procs.2019.12.132.

Kumar, S., Gornale, S. S., Siddalingappa, R., & Mane, A. (2022). Gender Classification Based on Online Signature Features using Machine Learning Techniques. International Journal of Intelligent Systems and Applications in Engineering, 10(2), 260–268. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2020

« UCI Machine Learning Repository: Cervical cancer (Risk Factors) Data Set », 2017. https://archive.ics.uci.edu/ml/datasets/Cervical+cancer+%28Risk+Factors%29 (consulté le 17 juillet 2021).

N. V. Chawla, K. W. Bowyer, L. O. Hall, et W. P. Kegelmeyer, « SMOTE: Synthetic Minority Over-sampling Technique », J. Artif. Intell. Res., vol. 16, p. 321 357, juin 2002, doi: 10.1613/jair.953.

EL-YAHYAOUI, A., & OMARY, F. (2022). An improved Framework for Biometric Database’s privacy. International Journal of Communication Networks and Information Security (IJCNIS), 13(3). https://doi.org/10.17762/ijcnis.v13i3.5143

J. Jeremiah Tanimu, M. Hamada, M. Hassan, et S. Yusuf Ilu, « A Contemporary Machine Learning Method for Accurate Prediction of Cervical Cancer », SHS Web Conf., vol. 102, p. 04004, 2021, doi: 10.1051/shsconf/202110204004.

Flowchart of the proposed solution

Downloads

Published

16.12.2022

How to Cite

Khamlıchı, S. E. ., Ouahab, I. B. A. ., Bouhorma, M. ., Elouaaı, F. ., Sedquı, A. ., & Maurady, A. . (2022). An Evaluation of Machine Learning Algorithms and Feature Selection Methods for Cervical Cancer Risk Prediction using Clinical Features. International Journal of Intelligent Systems and Applications in Engineering, 10(4), 470–479. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/2284

Issue

Section

Research Article