Analysis of Class Imbalanced Brain Tumor Using Machine Learning Techniques

Prabhat Kumar  Sahu; Mitrabinda  Khuntia; Satish  Choudhury; Binod Kumar  Pattanayak

Authors

Prabhat Kumar Sahu Department of Computer Science and Information Technology, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, INDIA
Mitrabinda Khuntia Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, INDIA
Satish Choudhury Department of Electrical Engineering, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, INDIA
Binod Kumar Pattanayak Department of Computer Science and Engineering, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, INDIA

Keywords:

Smote, Enn, Smote-Enn, Adasyn, Decision Tree, Logistoc Regression, Random Forest, Gaussian Naïve Bayes, Extra Tree Classifiers

Abstract

It is evident that healthcare has become a critical global priority, and the intelligent utilization of clinical datasets is essential for establishing an effective and efficient healthcare system capable of monitoring and managing people's health. However, the issue of class imbalance in real-world datasets, including clinical datasets, poses significant challenges to the training of classifiers and can result in reduced accuracy, precision, recall, and increased misclassifications. In our comprehensive literature review, we've examined the performance of five well-known classifiers—Logistic Regression, Decision Tree, Gaussian Naive Bayes, Random Forest, and Extra Tree classifiers—over imbalanced brain tumor datasets. We have also evaluated the effectiveness of four different class balancing techniques—SMOTE, ADASYN, ENN, and SMOTE-ENN—in addressing the challenges posed by imbalanced class distributions. The results of our study indicate that the SMOTE-ENN balancing approach has demonstrated superior performance compared to the other three data balancing strategies when used with all five classifiers. Additionally, although the other three balancing strategies, namely SMOTE, ADASYN, and ENN, performed relatively well, they slightly underperformed in comparison to the SMOTE-ENN approach. The identification of the SMOTE-ENN approach as the most effective strategy for handling imbalanced datasets is significant, as it highlights the importance of combining over-sampling and under-sampling techniques to achieve a more balanced and representative dataset for training classifiers. By effectively addressing the issue of class imbalance, the SMOTE-ENN approach allows for the development of more robust and accurate predictive models, thus improving the overall performance of the classifiers on imbalanced brain tumor datasets. Our study contributes valuable insights into the selection of appropriate data balancing strategies and classifier choices when dealing with imbalanced datasets in the healthcare domain. By providing a comprehensive overview of the empirical performance of different classifiers and balancing techniques, we have laid the foundation for implementing more effective and reliable supervised machine learning algorithms in the context of clinical data analysis. The recommendations we offer for dealing with class imbalanced datasets further enhance the practical applicability of our research findings.

Downloads

Download data is not yet available.

References

Akram, F., Liu, D., Zhao, P., Kryvinska, N., Abbas, S., & Rizwan, M. (2021). Trustworthy intrusion detection in e-healthcare systems. Frontiers in public health, 9, 788347.

Javed, A. R., Shahzad, F., ur Rehman, S., Zikria, Y. B., Razzak, I., Jalil, Z., & Xu, G. (2022). Future smart cities: Requirements, emerging technologies, applications, challenges, and future aspects. Cities, 129, 103794.

Zhang, L., Zhong, Q., & Yu, Z. (2021). Optimization of tumor disease monitoring in medical big data environment based on high-order simulated annealing neural network algorithm. Computational Intelligence and Neuroscience, 2021, 1-9.

Ali, T. M., Nawaz, A., Ur Rehman, A., Ahmad, R. Z., Javed, A. R., Gadekallu, T. R., ... & Wu, C. M. (2022). A sequential machine learning-cum-attention mechanism for effective segmentation of brain tumor. Frontiers in Oncology, 12, 873268.

Senan, E. M., Jadhav, M. E., Rassem, T. H., Aljaloud, A. S., Mohammed, B. A., & Al-Mekhlafi, Z. G. (2022). Early diagnosis of brain tumour mri images using hybrid techniques between deep and machine learning. Computational and Mathematical Methods in Medicine, 2022.

Rathod, R., & Khan, R. A. H. (2021). Brain tumor detection using deep neural network and machine learning algorithm. PalArch's Journal of Archaeology of Egypt/Egyptology, 18(08), 1085-1093.

Alanazi, M. F., Ali, M. U., Hussain, S. J., Zafar, A., Mohatram, M., Irfan, M., ... & Albarrak, A. M. (2022). Brain tumor/mass classification framework using magnetic-resonance-imaging-based isolated and developed transfer deep-learning model. Sensors, 22(1), 372.

Kumar, T. S., Arun, C., & Ezhumalai, P. (2022). An approach for brain tumor detection using optimal feature selection and optimized deep belief network. Biomedical Signal Processing and Control, 73, 103440.

Alsaif, H., Guesmi, R., Alshammari, B. M., Hamrouni, T., Guesmi, T., Alzamil, A., & Belguesmi, L. (2022). A novel data augmentation-based brain tumor detection using convolutional neural network. Applied Sciences, 12(8), 3773.

Al-Shoukry, S., Rassem, T. H., & Makbol, N. M. (2020). Alzheimer’s diseases detection by using deep learning algorithms: a mini-review. IEEE Access, 8, 77131-77141.

Gab Allah, A. M., Sarhan, A. M., & Elshennawy, N. M. (2021). Classification of brain MRI tumor images based on deep learning PGGAN augmentation. Diagnostics, 11(12), 2343.

Kumar, V., Aydav, P.S.S., Minz, S., 2021. Multi-view ensemble learning using multiobjective particle swarm optimization for high dimensional data classification. J. King Saud Univ.-Comput. Informat. Sci.

Anwar, H., Qamar, U., Muzaffar Qureshi, A.W., 2014. Global optimization ensemble model for classification methods. Sci. World J. 2014.

Shahzad, R.K., Lavesson, N., 2013. Comparative analysis of voting schemes for ensemble-based malware detection. J. Wireless Mobile Netw., Ubiquitous Comput. Dependable Appl. 4 (1), 98–117.

Prusa, J., Khoshgoftaar, T.M., Dittman, D.J., 2015. Using ensemble learners to improve classifier performance on tweet sentiment data. 2015 IEEE International Conference on Information Reuse and Integration. IEEE, pp. 252–257.

Ekbal, A., Saha, S., 2011. A multiobjective simulated annealing approach for classifier ensemble: Named entity recognition in indian languages as case studies. Expert Syst. Appl. 38(12), 14 760–14 772.

Krawczyk, B., Minku, L.L., Gama, J., Stefanowski, J., Woz´ niak, M., 2017. Ensemble learning for data stream analysis: A survey. Informat. Fusion 37, 132–156.

Dong, X., Yu, Z., Cao, W., Shi, Y., Ma, Q., 2020. A survey on ensemble learning. Front. Comput. Sci. 14 (2), 241–258.

Tsai, C.-F., Lin, Y.-C., Yen, D.C., Chen, Y.-M., 2011. Predicting stock returns by classifier ensembles. Appl. Soft Comput. 11 (2), 2452–2459.

Abellán, J., Mantas, C.J., 2014. Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring. Expert Syst. Appl. 41 (8), 3825–3830.

Catal, C., Tufekci, S., Pirmit, E., Kocabag, G., 2015. On the use of ensemble of classifiers for accelerometer-based activity recognition. Appl. Soft Comput. 37, 1018–1022.

Da Silva, N.F., Hruschka, E.R., Hruschka Jr, E.R., 2014. Tweet sentiment analysis with classifier ensembles. Decis. Support Syst. 66, 170–179.

Aburomman, A.A., Reaz, M.B.I., 2016. A novel svm-knn-pso ensemble method for intrusion detection system. Appl. Soft Comput. 38, 360–372.

Haralabopoulos, G., Anagnostopoulos, I., McAuley, D., 2020. Ensemble deep learning for multilabel binary classification of user-generated content. Algorithms 13 (4), 83.

Alharbi, A., Kalkatawi, M., Taileb, M., 2021. Arabic sentiment analysis using deep learning and ensemble methods. Arabian J. Sci. Eng. 46 (9), 8913–8923.

Can Malli, R., Aygun, M., Kemal Ekenel, H., 2016. Apparent age estimation using ensemble of deep learning models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 9–16.

Ortiz, A., Munilla, J., Gorriz, J.M., Ramirez, J., 2016. Ensembles of deep learning architectures for the early diagnosis of the alzheimer’s disease. Int. J. Neural Syst. 26 (07), 1650025.

Tasci, E., Uluturk, C., Ugur, A., 2021. A voting-based ensemble deep learning method focusing on image augmentation and preprocessing variations for tuberculosis detection. Neural Comput. Appl., 1–15.

Xu, S., Liang, H., Baldwin, T., 2016. Unimelb at semeval-2016 tasks 4a and 4b: An ensemble of neural networks and a word2vec based model for sentiment classification. In: Proceedings of the 10th international Workshop on Semantic Evaluation (SemEval-2016), pp. 183–189.

J. Xiao, Svm and knn ensemble learning for tra_c incident detection, Physica A: Statistical Mechanics and its Applications 517 (2019) 29–35.

R. Polikar, Ensemble learning, in: Ensemble machine learning, Springer, 2012, pp. 1–34.

T. G. Dietterich, et al., Ensemble learning, The handbook of brain theory and neural networks 2 (2002) 110–125.

B. Zenko, L. Todorovski, S. Dzeroski, A comparison of stacking with meta decision trees to bagging, boosting, and stacking with other methods, in: Proceedings 2001 IEEE International Conference on Data Mining, IEEE, 2001, pp. 669–670.

X. Dong, Z. Yu, W. Cao, Y. Shi, Q. Ma, A survey on ensemble learning, Frontiers of Computer Science 14 (2020) 241–258.

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357.

He, H., Bai, Y., Garcia, E. A., & Li, S. (2008, June). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence) (pp. 1322-1328). Ieee.

Rahman, M. M., & Davis, D. N. (2013). Addressing the class imbalance problem in medical datasets. International Journal of Machine Learning and Computing, 3(2), 224.

Batista, G. E., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1), 20-29.

Kovács, G. (2019). An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing, 83, 105662.

Safavian, S. R., & Landgrebe, D. (1991). A survey of decision tree classifier methodology. IEEE transactions on systems, man, and cybernetics, 21(3), 660-674.

Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN model-based approach in classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE: OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3-7, 2003. Proceedings (pp. 986-996). Springer Berlin Heidelberg.

Khuntia, M., Sahu, P. K., & Devi, S. (2022). Prediction of Presence of Brain Tumor Utilizing Some State-of-the-Art Machine Learning Approaches. International Journal of Advanced Computer Science and Applications, 13(5).

Khuntia, M., Sahu, P. K., & Devi, S. (2022). Novel Strategies Employing Deep Learning Techniques for Classifying Pathological Brain from MR Images. International Journal of Advanced Computer Science and Applications, 13(11).

Khuntia, M., Sahu, P. K., & Devi, S. (2023). Deep Learning Approaches for Brain Tumor Diagnosis using Fused Layer Accelerator. Journal of Computer Science, 19(2).

Analysis of Class Imbalanced Brain Tumor Using Machine Learning Techniques

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Announcements

Information for Authors

ijisae

Information

trindex