Supervised Machine Learning-Based Classification and Prediction of Breast Cancer

Authors

  • Sumeet Mathur, Sandeep Gupta

Keywords:

LGBM, Machine Learning, Breast Cancer, Gradient Boost, Wisconsin Dataset.

Abstract

There is currently no cure for breast cancer, despite the fact that it is one of the deadliest diseases afflicting women. Each year, the number of fatalities from breast cancer rises dramatically. The predominant form of cancer that results in mortality among women is this particular malignancy, which is of global distribution. To live a long and healthy life, any progress in the detection and treatment of cancer is essential. Therefore, maintaining the treatment aspect and patient survival level requires a high degree of accuracy in cancer prognosis. Detection of breast cancer has been facilitated by ML techniques ever since the inception of AI. This has allowed for an earlier diagnosis, hence improving the prognosis for patients. Researchers are paying close attention to ML methods because of their effectiveness; these approaches may soon have a major influence on the prediction and early diagnosis of breast cancer (BC). This article presents a method for automated BC screening that is based on ML. To facilitate an early detection of BC, numerous classification and prognostic models are developed in this research. These models rely on ML techniques, including gradient boosting and lung branching modelling. To test and train their models, the researchers in this work used an UCI ML Repository's BC Wisconsin (Diagnostic) dataset (BCWD). Additionally, an impact of feature selection, data balance, and data preparation methods used on the input dataset. An aim of this study is to identify a best ML algorithms for predicting and diagnosing BC using metrics including F1-score, accuracy, precision, recall, and confusion matrices. A result show that LGBM performed a best among the classifiers and had the greatest accuracy (98%). Based on the Python programming language and associated libraries, all work is completed in the Jupyter notebook environment.

Downloads

Download data is not yet available.

References

J. E. T. Akinsola, M. A. Adeagbo, and A. A. Awoseyi, “Breast cancer predictive analytics using supervised machine learning techniques,” Int. J. Adv. Trends Comput. Sci. Eng., 2019, doi: 10.30534/ijatcse/2019/70862019.

A. Brédart et al., “Clinicians’ use of breast cancer risk assessment tools according to their perceived importance of breast cancer risk factors: an international survey,” J. Community Genet., 2019, doi: 10.1007/s12687-018-0362-8.

P. Maas et al., “Breast Cancer Risk From Modifiable and Nonmodifiable Risk Factors Among White Women in the United States,” JAMA Oncol., 2016, doi: 10.1001/jamaoncol.2016.1025.

A. Yala, C. Lehman, T. Schuster, T. Portnoi, and R. Barzilay, “A deep learning mammography-based model for improved breast cancer risk prediction,” Radiology, 2019, doi: 10.1148/radiol.2019182716.

W. L. Bi et al., “Artificial intelligence in cancer imaging: Clinical challenges and applications,” CA. Cancer J. Clin., 2019, doi: 10.3322/caac.21552.

T. Yanes, M. A. Young, B. Meiser, and P. A. James, “Clinical applications of polygenic breast cancer risk: A critical review and perspectives of an emerging field,” Breast Cancer Research. 2020. doi: 10.1186/s13058-020-01260-3.

H. Behravan, J. M. Hartikainen, M. Tengström, V. –M Kosma, and A. Mannermaa, “Predicting breast cancer risk using interacting genetic and demographic factors and machine learning,” Sci. Rep., 2020, doi: 10.1038/s41598-020-66907-9.

K. Y. Obiwusi, Y. O. Olatunde, G. K. Afolabi, A. Oke, A. M. Oyelakin, and A. Salami, “Evaluating the Performance of Supervised Machine Learning Algorithms in Breast Cancer Datasets,” ASEAN J. Sci. Eng., 2023, doi: 10.17509/ajse.v3i2.46152.

M. Murugesan, M. Santhosh, S. K. T, M. Sasiwarman, and I. Valanarasu, “International Journal of Advanced Trends in Computer Science and Engineering Securing ATM Transactions using Face Recognition,” vol. 9, no. 2, pp. 1295–1299, 2020.

N. Das, J. Borah, and K. Sarmah, “Diagnosis and Classification of Breast Cancer Using Multiple Machine Learning Algorithms,” 2023. doi: 10.1109/InCACCT57535.2023.10141796.

R. H. Khan, J. Miah, M. M. Rahman, and M. Tayaba, “A Comparative Study of Machine Learning Algorithms for Detecting Breast Cancer,” 2023. doi: 10.1109/CCWC57344.2023.10099106.

Jamal, J. H. Antor, R. Kumar, and P. Rani, “Breast Cancer Prediction Using Machine Learning Classifiers,” 2022. doi: 10.1109/ICAST55766.2022.10039656.

C. Roy, I. Mazumder, S. Debdas, S. Samanta, and S. S. Roy, “Framework for Breast Cancer Diagnosis Using Machine Learning and IoT,” 2022. doi: 10.1109/ICEEICT53079.2022.9768469.

S. Anklesaria, U. Maheshwari, R. Lele, and P. Verma, “Breast Cancer Prediction using Optimized Machine Learning Classifiers and Data Balancing Techniques,” 2022. doi: 10.1109/ICCUBEA54992.2022.10010783.

M. P. Behera, A. Sarangi, D. Mishra, and S. K. Sarangi, “Breast Cancer Prediction Using Long Short-Term Memory Algorithm,” 2022. doi: 10.1109/CINE56307.2022.10037258.

H. Sharma, P. Singh, and A. Bhardwaj, “Breast Cancer Detection: Comparative Analysis of Machine Learning Classification Techniques,” 2022. doi: 10.1109/ESCI53509.2022.9758188.

H. Sami, M. Sagheer, K. Riaz, M. Q. Mehmood, and M. Zubair, “Machine Learning-Based Approaches for Breast Cancer Detection in Microwave Imaging,” 2021. doi: 10.23919/USNC-URSI51813.2021.9703518.

V. A. Telsang and K. Hegde, “Breast Cancer Prediction Analysis using Machine Learning Algorithms,” 2020. doi: 10.1109/C2I451079.2020.9368911.

N. Khuriwal and N. Mishra, “Breast cancer diagnosis using adaptive voting ensemble machine learning algorithm,” 2018. doi: 10.1109/ETECHNXT.2018.8385355.

M. S. Harinishree, C. R. Aditya, and D. N. Sachin, “Detection of Breast Cancer using Machine Learning Algorithms - A Survey,” 2021. doi: 10.1109/ICCMC51019.2021.9418488.

M. E. Gamil, M. Mohamed Fouad, M. A. Abd El Ghany, and K. Hoffinan, “Fully automated CADx for early breast cancer detection using image processing and machine learning,” 2018. doi: 10.1109/ICM.2018.8704097.

Vikas Chaurasia and Saurabh Pal, “Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability by Vikas Chaurasia, Saurabh Pal :: SSRN,” Int. J. Comput. Sci. Mob. Comput. IJCSMC, 2014.

A. U. Haq et al., “Detection of Breast Cancer through Clinical Data Using Supervised and Unsupervised Feature Selection Techniques,” IEEE Access, 2021, doi: 10.1109/ACCESS.2021.3055806.

S. Sharma, A. Aggarwal, and T. Choudhury, “Breast Cancer Detection Using Machine Learning Algorithms,” 2018. doi: 10.1109/CTEMS.2018.8769187.

M. M. Islam, M. R. Haque, H. Iqbal, M. M. Hasan, M. Hasan, and M. N. Kabir, “Breast Cancer Prediction: A Comparative Study Using Machine Learning Techniques,” SN Comput. Sci., 2020, doi: 10.1007/s42979-020-00305-w.

S. A. N. Alexandropoulos, S. B. Kotsiantis, and M. N. Vrahatis, “Data preprocessing in predictive data mining,” Knowl. Eng. Rev., 2019, doi: 10.1017/S026988891800036X.

J. Li et al., “Feature selection: A data perspective,” ACM Computing Surveys. 2017. doi: 10.1145/3136625.

Y. Sun, A. K. C. Wong, and M. S. Kamel, “Classification of imbalanced data: A review,” Int. J. Pattern Recognit. Artif. Intell., 2009, doi: 10.1142/S0218001409007326.

H. Han, W. Y. Wang, and B. H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” 2005. doi: 10.1007/11538059_91.

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Ann. Stat., 2001, doi: 10.1214/aos/1013203451.

C. Bentéjac, A. Csörgő, and G. Martínez-Muñoz, “A comparative analysis of gradient boosting algorithms,” Artif. Intell. Rev., 2021, doi: 10.1007/s10462-020-09896-5.

W. Liang, S. Luo, G. Zhao, and H. Wu, “Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms,” Mathematics, 2020, doi: 10.3390/MATH8050765.

S. K. Hashemi, S. L. Mirtaheri, and S. Greco, “Fraud Detection in Banking Data by Machine Learning Techniques,” IEEE Access, 2023, doi: 10.1109/ACCESS.2022.3232287.

S. Libesman et al., “An individual participant data meta-analysis of breast cancer detection and recall rates for digital breast tomosynthesis versus digital mammography population screening,” Clinical Breast Cancer. 2022. doi: 10.1016/j.clbc.2022.02.005.

Downloads

Published

24.03.2024

How to Cite

Sumeet Mathur. (2024). Supervised Machine Learning-Based Classification and Prediction of Breast Cancer . International Journal of Intelligent Systems and Applications in Engineering, 12(3), 3936 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6081

Issue

Section

Research Article