A Comprehensive Review: SMOTE-Based Oversampling Methods for Imbalanced Classification Techniques, Evaluation, and Result Comparisons

Authors

  • Bhushankumar Nemade Mukesh Patel School of Technology Management & Engineering, NMIMS University, India
  • Vinayak Bharadi Finolex Academy of Management and Technology, Mumbai University, India
  • Sujata S. Alegavi Thakur College of Engineering and Technology, Mumbai University, India
  • Bijith Marakarkandy S.P. Mandali’s Prin. L. N. Welingkar Institute of Management Development & Research (WeSchool), Mumbai, India

Keywords:

Imbalanced classification, class imbalance, oversampling methods, Synthetic Minority Over-sampling Technique (SMOTE), imbalanced datasets, minority class, synthetic samples, comparative analysis, evaluation methodologies, challenges, Borderline-SMOTE, CCR-SMOTE-LR, SMOTE-ENC, Adaptive-SMOTE, SMOTE-Tomek, Safe-Level SMOTE, SMOTE-Boost, LN-SMOTE, Bagging-SMOTE, SVM-SMOTE

Abstract

This study conducts a comparative analysis of different SMOTE variants, assessing their effectiveness in diverse domains. By synthesizing the findings, it provides insights into the strengths, limitations, and future directions of oversampling methods, with a specific emphasis on SMOTE-based techniques. Through an in-depth survey of research papers and articles, it explores the principles, techniques, evaluation methodologies, and challenges associated with oversampling. This review serves as a valuable resource for researchers and practitioners, aiding informed decision-making and advancements in imbalanced classification. The proposed system is composed of six integral parts: real-time data collection, data cleaning, and feature extraction, handling of imbalanced data using various methods, selection of preferred classifiers, and the utilization of a voting principle for optimal prediction. In conclusion, the system employs a multi-model classification approach to enhance the efficiency of the aquaponics ecosystem. By leveraging the power of optimal prediction based on voting, the system evaluates the performance of four classifiers using benchmark parameters such as accuracy, time, recall, and Kappa. Through this evaluation, it identifies XGBoost and Random Forest as the most effective classifiers, based on the voting principle.

Downloads

Download data is not yet available.

References

Hui Han, Wen-Yuan Wang & Bing-Huan Mao, "Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning," International Conference on Intelligence Computing and Intelligent Systems (ICIS), 2005.

N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

Mohd Hakim Abdul Hamid,Marina Yusoff,Azlinah Mohamed,"Survey on Highly Imbalanced Multi-class Data",(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 13, No. 6, 2022

Chris Seiffert, Taghi M. Khoshgoftaar, Jason Van Hulse, Amri Napolitano, "RUSBoost: A Hybrid Approach to Alleviating Class Imbalance," IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 40, no. 1, pp. 185-197, 2010.

B. Nemade and D. Shah, "An IoT-Based Efficient Water Quality Prediction System for Aquaponics Farming," in A. Shukla, B. K. Murthy, N. Hasteer, and J. P. Van Belle (Eds.), Computational Intelligence, Lecture Notes in Electrical Engineering, vol. 968, Springer, Singapore, 2023, pp. 207-217. doi: 10.1007/978-981-19-7346-8_27.

Haibo He, Yang Bai, E. A. Garcia and Shutao Li, "ADASYN: Adaptive synthetic sampling approach for imbalanced learning," 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, 2008, pp. 1322-1328, doi: 10.1109/IJCNN.2008.4633969.

B. Nemade, D .Shah, "An efficient IoT based prediction system for classification of water using novel adaptive incremental learning framework", Journal of King Saud University - Computer and Information Sciences, Volume 34, Issue 8, Part A,2022,ISSN 1319-1578,https://doi.org/10.1016/j.jksuci.2022.01.009.

Tianlun Zhang,Xi Yang, "G-SMOTE: A GMM-based synthetic minority oversampling technique for imbalanced learning",ArXiv. /abs/1810.10363

J.M. Johnson and T.M. Khoshgoftaar, "Survey on deep learning with class imbalance," Journal of Big Data, vol. 6, no. 1, article 27, 2019. DOI: 10.1186/s40537-019-0192-5.

Xiaowei Gu, Plamen P Angelov, "A Self-Adaptive Synthetic Over-Sampling Technique for Imbalanced Classification", ArXiv. https://doi.org/10.1002/int.22230.

K. Maharana, S. Mondal, B. Nemade, "A review: Data pre-processing and data augmentation techniques", Global Transitions Proceedings, Volume 3, Issue 1, 2022,Pages 91-99,ISSN 2666-285X, https://doi.org/10.1016/j.gltp.2022.04.020.

C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano, "RUSBoost: A Hybrid Approach to Alleviating Class Imbalance," IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, vol. 40, no. 1, pp. 185-197, Jan. 2010.

B. Xu, W. Wang, R. Yang and Q. Han, "An Improved Unbalanced Data Classification Method Based on Hybrid Sampling Approach," 2021 IEEE 4th International Conference on Big Data and Artificial Intelligence (BDAI), Qingdao, China, 2021, pp. 125-129, doi: 10.1109/BDAI52447.2021.9515306.

M. Mukherjee and M. Khushi, “SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features,” Applied System Innovation, vol. 4, no. 1, p. 18, Mar. 2021, doi: 10.3390/asi4010018.

H. He and E. A. Garcia, "Adaptive Synthetic Sampling Method for Imbalanced Data Learning," IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 5, pp. 734-749, May 2009.

I. Tomek, "Two Modifications of CNN," IEEE Transactions on Systems, Man, and Cybernetics, vol. SMC-6, no. 11, pp. 769-772, Nov. 1976.

C. Bunkhumpornpat, K.Sinapiromsaran and C. Lursinsap,"Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem",Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_43

N. V. Chawla, A. Lazarevic, L. Hall, K. Bowyer, "SMOTEBoost: Improving Prediction of the Minority Class in Boosting," 7th European Conference on Principles and Practice of Knowledge Discovery in Databases, Cavtat-Dubrovnik, Croatia, September 22-26, 2003, Proceedings.

Tomasz Maciejewski and Jerzy Stefanowski, "Local neighbourhood extension of SMOTE for mining imbalanced data," Conference: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011, part of the IEEE Symposium Series on Computational Intelligence 2011, April 11-15, 2011, Paris, France

Triguero, S. García, M. Galar, J. A. Sáez, and F. Herrera, "Enhancing techniques for learning decision trees from imbalanced data," Knowledge-Based Systems, vol. 87, pp. 69-81, 2015. Adv Data Anal Classif 14, 677–745 (2020). https://doi.org/10.1007/s11634-019-00354-x

Blagus, R., Lusa, L.,"SMOTE for high-dimensional class-imbalanced data",BMC Bioinformatics 14, 106 (2013). https://doi.org/10.1186/1471-2105-14-106

Demidova, Liliya & Klyueva, Irina,"SVM classification: Optimization with the SMOTE algorithm for the class imbalance problem".(2017), 1-4.10.1109/MECO.2017.7977136.

M. Kubat and S. Matwin, "Addressing the curse of imbalanced training sets: One-sided selection”, Proceedings of the 14th International Conference on Machine Learning, 179-186, 1997.

Lunardon, Nicola & Menardi, Giovanna & Torelli, Nicola, "ROSE: A Package for Binary Imbalanced Learning," R Journal. 6. 79-89. 10.32614/RJ-2014-008.

Mark White, Kevin Hall, Ana Silva, Ana Rodriguez, Laura López. Predicting Educational Outcomes using Social Network Analysis and Machine Learning . Kuwait Journal of Machine Learning, 2(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/182

Dhabliya, D. (2021). Delay-tolerant sensor network (DTN) implementation in cloud computing. Paper presented at the Journal of Physics: Conference Series, , 1979(1) doi:10.1088/1742-6596/1979/1/012031 Retrieved from www.scopus.com

Dhabliya, D. (2019). Security analysis of password schemes using virtual environment. International Journal of Advanced Science and Technology, 28(20), 1334-1339. Retrieved from www.scopus.com

Downloads

Published

12.07.2023

How to Cite

Nemade, B. ., Bharadi, V. ., Alegavi, S. S., & Marakarkandy, B. . (2023). A Comprehensive Review: SMOTE-Based Oversampling Methods for Imbalanced Classification Techniques, Evaluation, and Result Comparisons. International Journal of Intelligent Systems and Applications in Engineering, 11(9s), 790–803. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3268

Issue

Section

Research Article