Breast Cancer Detection Using Random Forest Supported by Feature Selection

Authors

  • Mustafa Ali Hasan Dalfi ATISP research unit National School of Electronics and Telecommunications of Sfax (ENET'Com) University, Sfax, Tunisia
  • Sihem Chaabouni LAB. SM@RTS, CENTRE DE RECHERCHE EN NUM´ERIQUE DE SFAX, TUNISIA ENET’COM, Sfax University, Technopole de Sfax, 3021 Sfax, Tunisia
  • Ahmed Fakhfakh LAB. SM@RTS, CENTRE DE RECHERCHE EN NUM´ERIQUE DE SFAX, TUNISIA

Keywords:

Breast Cancer, Early Detection, Feature Selection, Mammography, Random Forests

Abstract

Breast cancer has been responsible for the loss of approximately 1.5 million lives over the past 35 years, despite considerable investments in mammography-based detection and treatment. This persistently high death rate underscores the urgency for improved strategies. Research consistently emphasizes the significance of detecting cancer at its early stages, ideally when the tumor size is confined to a modest 5-10 millimeters, thus minimizing the need for invasive procedures such as intensive chemotherapy or radiation. However, the current primary detection methods often fall short in identifying these small, elusive tumors, particularly when they are nestled within dense breast tissue. Consequently, there is a pressing need for more efficient screening techniques. In this study, we propose an innovative machine learning based methodology for Breast Cancer Detection that employs the Feature Selection-Aided Random Forest Algorithm. The research framework incorporates advanced feature selection techniques, such as Variance Inflation Factor (VIF), Model-based Feature Selection, Recursive Feature Elimination, and Univariate Feature Selection, to extract highly relevant features and uncover hidden patterns associated with tumors. Experimental results demonstrate the remarkable effectiveness of this approach,  with feature selection facilitated by the Variance Inflation Factor (VIF) algorithm achieving 98.83% accuracy when evaluated on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. This approach effectively identifies the most appropriate features, significantly enhancing the breast cancer detection system's performance.

Downloads

Download data is not yet available.

References

C. D. Runowicz et al., "American cancer society/American society of clinical oncology breast cancer survivorship care guideline," CA: a cancer journal for clinicians, vol. 66, no. 1, pp. 43-73, 2016.

M. Ataollahi, J. Sharifi, M. Paknahad, and A. Paknahad, "Breast cancer and associated factors: a review," Journal of medicine and life, vol. 8, no. Spec Iss 4, p. 6, 2015.

T. J. Whelan et al., "External beam accelerated partial breast irradiation versus whole breast irradiation after breast conserving surgery in women with ductal carcinoma in situ and node-negative breast cancer (RAPID): a randomised controlled trial," The Lancet, vol. 394, no. 10215, pp. 2165-2172, 2019.

A. H. Eijkelboom et al., "Impact of the suspension and restart of the Dutch breast cancer screening program on breast cancer incidence and stage during the COVID-19 pandemic," Preventive medicine, vol. 151, p. 106602, 2021.

A. Kumar et al., "Deep feature learning for histopathological image classification of canine mammary tumors and human breast cancer," Information Sciences, vol. 508, pp. 405-421, 2020.

M. Samieinasab, S. A. Torabzadeh, A. Behnam, A. Aghsami, and F. Jolai, "Meta-Health Stack: A new approach for breast cancer prediction," Healthcare Analytics, vol. 2, p. 100010, 2022.

S. Raj, S. Singh, A. Kumar, S. Sarkar, and C. Pradhan, "Feature selection and random forest classification for breast cancer disease," Data Analytics in Bioinformatics: A Machine Learning Perspective, pp. 191-210, 2021.

A. U. Haq et al., "Detection of Breast Cancer Through Clinical Data Using Supervised and Unsupervised Feature Selection Techniques," IEEE Access, vol. 9, pp. 22090-22105, 2021, doi: 10.1109/ACCESS.2021.3055806.

Z. Huang and D. Chen, "A Breast Cancer Diagnosis Method Based on VIM Feature Selection and Hierarchical Clustering Random Forest Algorithm," IEEE Access, vol. 10, pp. 3284-3293, 2022, doi: 10.1109/ACCESS.2021.3139595.

P. H. Prastyo, I. G. Y. Paramartha, M. S. M. Pakpahan, and I. Ardiyanto, "Predicting Breast Cancer: A Comparative Analysis of Machine Learning Algorithms," in Proceeding International Conference on Science and Engineering, 2020, vol. 3, pp. 455-459.

B. Padmapriya and T. Velmurugan, "Classification algorithm based analysis of breast cancer data," International Journal of Data Mining Techniques and Applications, vol. 5, no. 1, pp. 43-49, 2016.

P. Singhal and S. Pareek, "Artificial neural network for prediction of breast cancer," in 2018 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC) I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), 2018 2nd International Conference on, 2018: IEEE, pp. 464-468.

A. A. Bataineh, "A comparative analysis of nonlinear machine learning algorithms for breast cancer detection," International Journal of Machine Learning and Computing, vol. 9, no. 3, pp. 248-254, 2019.

B. Zheng, S. W. Yoon, and S. S. Lam, "Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms," Expert Systems with Applications, vol. 41, no. 4, pp. 1476-1482, 2014.

M. Minnoor and V. Baths, "Diagnosis of Breast Cancer Using Random Forests," Procedia Computer Science, vol. 218, pp. 429-437, 2023/01/01/ 2023, doi: https://doi.org/10.1016/j.procs.2023.01.025.

C. Nguyen, Y. Wang, and H. N. Nguyen, "Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic," 2013.

E. Aličković and A. Subasi, "Breast cancer diagnosis using GA feature selection and Rotation Forest," Neural Computing and applications, vol. 28, pp. 753-763, 2017.

M. H. Alshayeji, H. Ellethy, and R. Gupta, "Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach," Biomedical Signal Processing and Control, vol. 71, p. 103141, 2022.

L. Dora, S. Agrawal, R. Panda, and A. Abraham, "Optimal breast cancer classification using Gauss–Newton representation based algorithm," Expert Systems with Applications, vol. 85, pp. 134-145, 2017.

A. Asuncion and D. Newman, "UCI machine learning repository," ed: Irvine, CA, USA, 2007.

S. Patro and K. K. Sahu, "Normalization: A preprocessing stage," arXiv preprint arXiv:1503.06462, 2015.

G. C. Cawley and N. L. Talbot, "On over-fitting in model selection and subsequent selection bias in performance evaluation," The Journal of Machine Learning Research, vol. 11, pp. 2079-2107, 2010.

C. G. Thompson, R. S. Kim, A. M. Aloe, and B. J. Becker, "Extracting the variance inflation factor and other multicollinearity diagnostics from typical regression results," Basic and Applied Social Psychology, vol. 39, no. 2, pp. 81-90, 2017.

B. F. Darst, K. C. Malecki, and C. D. Engelman, "Using recursive feature elimination in random forest to account for correlated variables in high dimensional data," BMC genetics, vol. 19, no. 1, pp. 1-6, 2018.

A. Jović, K. Brkić, and N. Bogunović, "A review of feature selection methods with applications," in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 25-29 May 2015 2015, pp. 1200-1205, doi: 10.1109/MIPRO.2015.7160458.

G. N. Ramadevi, K. U. Rani, and D. Lavanya, "Importance of feature extraction for classification of breast cancer datasets—a study," International Journal of Scientific and Innovative Mathematical Research, vol. 3, no. 2, pp. 763-368, 2015.

L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.

P. Probst, M. N. Wright, and A. L. Boulesteix, "Hyperparameters and tuning strategies for random forest," Wiley Interdisciplinary Reviews: data mining and knowledge discovery, vol. 9, no. 3, p. e1301, 2019.

J. Gómez-Ramírez, M. Ávila-Villanueva, and M. Á. Fernández-Blázquez, "Selecting the most important self-assessed features for predicting conversion to mild cognitive impairment with random forest and permutation-based methods," Scientific Reports, vol. 10, no. 1, pp. 1-15, 2020.

V. E. Christo, H. K. Nehemiah, J. Brighty, and A. Kannan, "Feature selection and instance selection from clinical datasets using co-operative co-evolution and classification using random forest," IETE Journal of Research, vol. 68, no. 4, pp. 2508-2521, 2022.

D. Paul, R. Su, M. Romain, V. Sébastien, V. Pierre, and G. Isabelle, "Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier," Computerized Medical Imaging and Graphics, vol. 60, pp. 42-49, 2017.

S. Wang, Y. Wang, D. Wang, Y. Yin, Y. Wang, and Y. Jin, "An improved random forest-based rule extraction method for breast cancer diagnosis," Applied Soft Computing, vol. 86, p. 105941, 2020.

Singh, J. ., Mani, A. ., Singh, H. ., & Rana, D. S. . (2023). Solution of the Multi-objective Economic and Emission Load Dispatch Problem Using Adaptive Real Quantum Inspired Evolutionary Algorithm. International Journal on Recent and Innovation Trends in Computing and Communication, 11(1s), 01–12. https://doi.org/10.17762/ijritcc.v11i1s.5989

Dhabliya, P. D. . (2020). Multispectral Image Analysis Using Feature Extraction with Classification for Agricultural Crop Cultivation Based On 4G Wireless IOT Networks. Research Journal of Computer Systems and Engineering, 1(1), 01–05. Retrieved from https://technicaljournals.org/RJCSE/index.php/journal/article/view/10

Downloads

Published

27.10.2023

How to Cite

Hasan Dalfi, M. A. ., Chaabouni, S. ., & Fakhfakh, A. . (2023). Breast Cancer Detection Using Random Forest Supported by Feature Selection. International Journal of Intelligent Systems and Applications in Engineering, 12(2s), 223–238. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3575

Issue

Section

Research Article