Diagnosis of Breast Cancer Using Improved Machine Learning Algorithms Based on Bayesian Optimization
AbstractBreast cancer is one of the most common types of cancer and is the second main cause of cancer death in females. Early detection of breast cancer is crucial for the survival of a patient as well as for the quality of life throughout cancer treatment. The aim of this study is to develop improved machine learning models for early diagnosis of breast cancer with high accuracy. In this context, a performance comparison of machine learning algorithms including Support Vector Machines, Decision Trees, Naive Bayes, K-Nearest Neighbor, and Ensemble Classifiers was performed on a dataset consisting of routine blood analysis combined with anthropometric measurements to diagnose breast cancer. Neighborhood component analysis was applied as a feature selection method to reveal relevant biomarkers that can be used in breast cancer prediction. In order to assess the performance of each proposed classifier model, two different data division procedures such as hold-out and 10-fold cross-validation were employed. Bayesian Optimization algorithm was applied to all classifiers for the maximizing the prediction accuracy. Different performance criteria such as accuracy, precision, sensitivity, specificity, and F-measure were used to measure the success of each classifier. Experimental results show that the Bayesian optimization-based K-Nearest Neighbor performs better than other machine learning algorithms under the hold-out data division protocol with an accuracy of 95.833%. The results obtained in this study may provide a new perspective on the application of improved machine learning techniques for the early detection of breast cancer.
W. The International Agency for Research on Cancer (IARC) report, “Latest global cancer data: Cancer burden rises to 18.1 million new cases and 9.6 million cancer deaths in 2018,” Int. Agency Res. Cancer, no. September, pp. 13–15, 2018.
S. Sapate, S. Talbar, A. Mahajan, N. Sable, S. Desai, and M. Thakur, “Breast cancer diagnosis using abnormalities on ipsilateral views of digital mammograms,” Biocybern. Biomed. Eng., pp. 1–16, 2019, https://doi: 10.1016/j.bbe.2019.04.008.
Z. Ceylan and E. Pekel, “Comparison of Multi-Label Classification Methods for Prediagnosis of Cervical Cancer,” Int. J. Intell. Syst. Appl. Eng., vol. 5, no. 4 SE-Research Article, Dec. 2017, doi: 10.18201/ijisae.2017533896.
B. K. Singh, “Determining relevant biomarkers for prediction of breast cancer using anthropometric and clinical features: A comparative investigation in machine learning paradigm,” Biocybern. Biomed. Eng., vol. 39, no. 2, pp. 393–409, 2019, doi: 10.1016/j.bbe.2019.03.001.
J. Gong, X. Bai, D. -a. Li, J. Zhao, and X. Li, “Prognosis Analysis of Heart Failure Based on Recurrent Attention Model,” IRBM, 2019, doi: https://doi.org/10.1016/j.irbm.2019.08.002.
M. Toğaçar, B. Ergen, and Z. Cömert, “A Deep Feature Learning Model for Pneumonia Detection Applying a Combination of mRMR Feature Selection and Machine Learning Models,” IRBM, 2019, doi: https://doi.org/10.1016/j.irbm.2019.10.006.
R. D. Badgujar and P. J. Deore, “Hybrid Nature Inspired SMO-GBM Classifier for Exudate Classification on Fundus Retinal Images,” IRBM, vol. 40, no. 2, pp. 69–77, 2019, doi: https://doi.org/10.1016/j.irbm.2019.02.003.
A. Asuncion and D. Newman, “UCI machine learning repository.” 2007.
S. Liu et al., “Quantitative analysis of breast cancer diagnosis using a probabilistic modelling approach,” Comput. Biol. Med., vol. 92, no. November 2017, pp. 168–175, 2018, doi: 10.1016/j.compbiomed.2017.11.014.
R. N. Das, Y. Lee, S. Mukherjee, and S. Oh, “Relationship of body mass index with diabetes and breast cancer biomarkers,” vol. 9, pp. 1–6, 2019.
J. H. Kang, B. Y. Yu, and D. S. Youn, “Relationship of serum adiponectin and resistin levels with breast cancer risk,” J. Korean Med. Sci., vol. 22, no. 1, pp. 117–121, 2007, doi: 10.3346/jkms.2007.22.1.117.
H. L. Hwa et al., “Prediction of breast cancer and lymph node metastatic status with tumour markers using logistic regression models,” J. Eval. Clin. Pract., vol. 14, no. 2, pp. 275–280, 2008, doi: 10.1111/j.1365-2753.2007.00849.x.
J. G. Santillán-Benítez et al., “The Tetrad BMI, Leptin, Leptin/Adiponectin (L/A) Ratio and CA 15-3 are Reliable Biomarkers of Breast Cancer,” J. Clin. Lab. Anal., vol. 27, no. 1, pp. 12–20, 2013, doi: 10.1002/jcla.21555.
X. Provatopoulou et al., “Serum irisin levels are lower in patients with breast cancer: Association with disease diagnosis and tumor characteristics,” BMC Cancer, vol. 15, no. 1, pp. 1–9, 2015, doi: 10.1186/s12885-015-1898-1.
A. M. A. Assiri, H. F. M. Kamel, and M. F. R. Hassanien, “Resistin, visfatin, adiponectin, and leptin: Risk of breast cancer in pre- and postmenopausal saudi females and their possible diagnostic and predictive implications as novel biomarkers,” Dis. Markers, vol. 2015, 2015, doi: 10.1155/2015/253519.
A. M. A. Assiri and H. F. M. Kamel, “Evaluation of diagnostic and predictive value of serum adipokines: Leptin, resistin and visfatin in postmenopausal breast cancer,” Obes. Res. Clin. Pract., vol. 10, no. 4, pp. 442–453, 2016, doi: 10.1016/j.orcp.2015.08.017.
M. Patrício et al., “Using Resistin, glucose, age and BMI to predict the presence of breast cancer,” BMC Cancer, vol. 18, no. 1, pp. 1–8, 2018, doi: 10.1186/s12885-017-3877-1.
Y. Li, “Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction,” Appl. Comput. Math., vol. 7, no. 4, p. 212, 2018, doi: 10.11648/j.acm.20180704.15.
M. F. Aslan, Y. Celik, K. Sabanci, and A. Durdu, “Breast Cancer Diagnosis by Different Machine Learning Methods Using Blood Analysis Data,” Int. J. Intell. Syst. Appl. Eng., vol. 6, no. 4 SE-Research Article, Dec. 2018, doi: 10.18201/ijisae.2018648455.
S. B. Akben, “Determination of the Blood, Hormone and Obesity Value Ranges that Indicate the Breast Cancer, Using Data Mining Based Expert System,” Irbm, vol. 40, no. 6, pp. 355–360, 2019, doi: 10.1016/j.irbm.2019.05.007.
V. Silva Araújo, A. Guimarães, P. de Campos Souza, T. Silva Rezende, and V. Souza Araújo, “Using Resistin, Glucose, Age and BMI and Pruning Fuzzy Neural Network for the Construction of Expert Systems in the Prediction of Breast Cancer,” Mach. Learn. Knowl. Extr., vol. 1, no. 1, pp. 466–482, 2019, doi: 10.3390/make1010028.
E. Brochu, V. M. Cora, and N. de Freitas, “A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning,” 2010.
B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas, “Taking the Human Out of the Loop: A Review of Bayesian Optimization,” Proc. IEEE, vol. 104, no. 1, pp. 148–175, 2016, doi: 10.1109/JPROC.2015.2494218.
W. Yang, K. Wang, and W. Zuo, “Neighborhood component feature selection for high-dimensional data,” J. Comput., vol. 7, no. 1, pp. 162–168, 2012, doi: 10.4304/jcp.7.1.161-168.
W. Bao, N. Lianju, and K. Yue, “Integration of unsupervised and supervised machine learning algorithms for credit risk assessment,” Expert Syst. Appl., vol. 128, pp. 301–315, 2019, doi: 10.1016/j.eswa.2019.02.033.
M. R. Salmanpour et al., “Optimized machine learning methods for prediction of cognitive outcome in Parkinson’s disease,” Comput. Biol. Med., vol. 111, no. February, p. 103347, 2019, doi: 10.1016/j.compbiomed.2019.103347.
P. H. Abreu, M. S. Santos, M. H. Abreu, B. Andrade, and D. C. Silva, “Predicting breast cancer recurrence using machine learning techniques: A systematic review,” ACM Comput. Surv., vol. 49, no. 3, 2016, doi: 10.1145/2988544.
I. O. Alade, M. A. Abd Rahman, and T. A. Saleh, “Predicting the specific heat capacity of alumina/ethylene glycol nanofluids using support vector regression model optimized with Bayesian algorithm,” Sol. Energy, vol. 183, pp. 74–82, May 2019, doi: 10.1016/J.SOLENER.2019.02.060.
L. Cornejo-Bueno, E. C. Garrido-Merchán, D. Hernández-Lobato, and S. Salcedo-Sanz, “Bayesian optimization of a hybrid system for robust ocean wave features prediction,” Neurocomputing, vol. 275, pp. 818–828, 2018, doi: 10.1016/j.neucom.2017.09.025.
J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011.
Copyright (c) 2020 ZEYNEP CEYLAN
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.