A Broad Review on Different Imbalanced Dataset Classification Approaches

Authors

  • K. Suresh Babu Associate Professor, Department of Computer Science and Engineering, Narasaraopet Engineering College, Narasaraopet, AP. India
  • B. Vara Prasada Rao Professor, Department of Computer Science and Engineering, RVR&JC College of Engineering, Guntur, Andhra Pradesh, India.
  • Y. Narasimha Rao Professor, School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India.
  • J. Hari Kiran Associate Professor, School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India.
  • Sai Chandana Bolem Assistant Professor, School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India.
  • Srinivasa Rao Battula Assistant Professor, School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India.
  • Koteswara Rao Chittepureddi Assistant Professor, School of Computer Science and Engineering, VIT-AP University, Amaravati, Andhra Pradesh, India.
  • Y. Anil Kumar Research Scholar, VIT-AP University, Amaravati, Vijayawada, India
  • Ratna Babu Chekka Associate Professor, Department of Computer Science and Engineering, RVR&JC College of Engineering,Guntur, Andhra Pradesh, India.

Keywords:

Imbalanced data, data mining, deep learning, classifiers, over and under sampling, optimization algorithms

Abstract

The series problem in extensive data application is managing the imbalanced data. Hence, the imbalanced data classification system was introduced to collect the imbalanced data at the maximum possible rate. Several neural mechanisms have been designed to classify imbalanced data with a high accuracy rate. However, the complexity of the data makes the classification process difficult by increasing the computation cost, resource usage, and algorithm complexity. Hence, this review has detailed several classification model performances in different imbalanced databases. Finally, the performance analysis has been done to analyse the classification performance of each model. Hence, the robustness has been estimated based on precision, specificity, accuracy, and sensitivity. In addition, the merits and limitations of each model are also discussed in detail. Subsequently, based on the demerits, the classification models provided future directions to improve the imbalance data.

Downloads

Download data is not yet available.

References

X. Yin, Q. Liu, Y. Pan, X. Huang, J. Wu, and X. Wang, 2021. Strength of stacking technique of ensemble learning in rockburst prediction with imbalanced data: Comparison of eight single and ensemble models. Natural Resources Research, 30, pp.1795-1815.

A. Dogan, and D. Birant, 2021. Machine learning and data mining in manufacturing. Expert Systems with Applications, 166, p.114060.

H. Thakkar, V. Shah, H. Yagnik, and M. Shah, 2021. Comparative anatomization of data mining and fuzzy logic techniques used in diabetes prognosis. Clinical eHealth, 4, pp.12-23.

Y. Pan, and L. Zhang, 2021. A BIM-data mining integrated digital twin framework for advanced project management. Automation in Construction, 124, p.103564.

P. Espadinha-Cruz, R. Godina, and E.M. Rodrigues, 2021. A review of data mining applications in semiconductor manufacturing. Processes, 9(2), p.305.

J. Jedrzejowicz, and P. Jedrzejowicz, 2021. GEP-based classifier for mining imbalanced data. Expert Systems with Applications, 164, p.114058.

P. Liu, W. Qingqing, and W. Liu, 2021. Enterprise human resource management platform based on FPGA and data mining. Microprocessors and Microsystems, 80, p.103330.

K.G. Al-Hashedi, and P. Magalingam, 2021. Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019. Computer Science Review, 40, p.100402.

Z. Sanad, and A. Al-Sartawi, 2021, March. Financial statements fraud and data mining: a review. In European, Asian, Middle Eastern, North African Conference on Management & Information Systems (pp. 407-414). Cham: Springer International Publishing.

L. Shabtay, P. Fournier-Viger, R. Yaari, and I. Dattner, 2021. A guided FP-Growth algorithm for mining multitude-targeted item-sets and class association rules in imbalanced data. Information Sciences, 553, pp.353-375.

E. Aminian, R.P. Ribeiro, and J. Gama, 2021. Chebyshev approaches for imbalanced data streams regression models. Data Mining and Knowledge Discovery, 35, pp.2389-2466.

L. Korycki, and B. Krawczyk, 2021, May. Low-dimensional representation learning from imbalanced data streams. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 629-641). Cham: Springer International Publishing.

J. Grzyb, J. Klikowski, and M. Woźniak, 2021. Hellinger distance weighted ensemble for imbalanced data stream classification. Journal of Computational Science, 51, p.101314.

N. Lu and T. Yin, 2021. Transferable common feature space mining for fault diagnosis with imbalanced data. Mechanical systems and signal processing, 156, p.107645.

D. Sisodia, and D.S. Sisodia, 2022. Data sampling strategies for click fraud detection using imbalanced user click data of online advertising: an empirical review. IETE Technical Review, 39(4), pp.789-798.

B. Mirzaei, B. Nikpour, and H. Nezamabadi-pour, 2021. CDBH: A clustering and density-based hybrid approach for imbalanced data classification. Expert Systems with Applications, 164, p.114035.

S.X. Chen, X.K. Wang, H.Y. Zhang, and J.Q. Wang, 2021. Customer purchase prediction from the perspective of imbalanced data: A machine learning framework based on factorization machine. Expert Systems with Applications, 173, p.114756.

S. Zhu, 2021. Analysis of the severity of vehicle-bicycle crashes with data mining techniques. Journal of safety research, 76, pp.218-227.

K. Yang, Z. Yu, C.P. Chen, W. Cao, J. You, and H.S. Wong, 2021. Incremental weighted ensemble broad learning system for imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 34(12), pp.5809-5824.

G.A. Pradipta, R. Wardoyo, A. Musdholifah, and I.N.H. Sanjaya, 2021. Radius-SMOTE: a new oversampling technique of minority samples based on radius distance for learning from imbalanced data. IEEE Access, 9, pp.74763-74777.

W. Wang, and D. Sun, 2021. The improved AdaBoost algorithms for imbalanced data classification. Information Sciences, 563, pp.358-374.

C. Hou, J. Wu, B. Cao, and J. Fan, 2021. A deep-learning prediction model for imbalanced time series data forecasting. Big Data Mining and Analytics, 4(4), pp.266-278.

R.M. Pereira, Y.M. Costa, and C.N. Silla Jr, 2021. Toward hierarchical classification of imbalanced data using random resampling algorithms. Information Sciences, 578, pp.344-363.

X. Wang, J. Xu, T. Zeng, and L. Jing, 2021. Local distribution-based adaptive minority oversampling for imbalanced data classification. Neurocomputing, 422, pp.200-213.

P. Vuttipittayamongkol, E. Elyan, and A. Petrovski, 2021. On the class overlap problem in imbalanced data classification. Knowledge-based systems, 212, p.106631.

L.M. Dang, S. Kyeong, Y. Li, H. Wang, T.N. Nguyen, and H. Moon, 2021. Deep learning-based sewer defect classification for highly imbalanced dataset. Computers & Industrial Engineering, 161, p.107630.

G.A.O.G.D. Sambasivam, and G.D. Opiyo, 2021. A predictive machine learning application in agriculture: Cassava disease detection and classification with imbalanced dataset using convolutional neural networks. Egyptian informatics journal, 22(1), pp.27-34.

V. Rupapara, F. Rustam, H.F. Shahzad, A. Mehmood, I. Ashraf, and G.S. Choi, 2021. Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access, 9, pp.78621-78634.

N.U. Maulidevi, and K. Surendro, 2022. SMOTE-LOF for noise identification in imbalanced data classification. Journal of King Saud University-Computer and Information Sciences, 34(6), pp.3413-3423.

P. Yao, S. Shen, M. Xu, P. Liu, F. Zhang, J. Xing, P. Shao, B. Kaffenberger, and R.X. Xu, 2021. Single model deep learning on imbalanced small datasets for skin lesion classification. IEEE transactions on medical imaging, 41(5), pp.1242-1254.

X. Wan, X. Zhang, and L. Liu, 2021. An improved VGG19 transfer learning strip steel surface defect recognition deep neural network based on few samples and imbalanced datasets. Applied Sciences, 11(6), p.2606.

K.R.M. Fernando, and C.P. Tsokos, 2021. Dynamically weighted balanced loss: class imbalanced learning and confidence calibration of deep neural networks. IEEE Transactions on Neural Networks and Learning Systems, 33(7), pp.2940-2951.

S.F. Yilmaz, E.B. Kaynak, A. Koç, H. Dibeklioğlu, and S.S. Kozat, 2021. Multi-label sentiment analysis on 100 languages with dynamic weighting for label imbalance. IEEE Transactions on Neural Networks and Learning Systems.

Y. Kim, Y. Lee, and M. Jeon, 2021. Imbalanced image classification with complement cross entropy. Pattern Recognition Letters, 151, pp.33-40.

Z. Yan, and H. Wen, 2021. Electricity theft detection base on extreme gradient boosting in AMI. IEEE Transactions on Instrumentation and Measurement, 70, pp.1-9.

H.T.T. Nguyen, L.H. Chen, V.S. Saravanarajan, and H.Q. Pham, 2021, May. Using XG Boost and Random Forest Classifier Algorithms to Predict Student Behavior. In 2021 Emerging Trends in Industry 4.0 (ETI 4.0) (pp. 1-5). IEEE.

Y. Dong, X. Shen, Z. Jiang, and H. Wang, 2021. Recognition of imbalanced underwater acoustic datasets with exponentially weighted cross-entropy loss. Applied Acoustics, 174, p.107740.

Y. Xu, Z. Yu, C.P. Chen, and Z. Liu, 2021. Adaptive subspace optimization ensemble method for high-dimensional imbalanced data classification. IEEE Transactions on Neural Networks and Learning Systems.

E.M. Hassib, A.I. El-Desouky, L.M. Labib, and E.S.M. El-Kenawy, 2020. WOA+ BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network. soft computing, 24, pp.5573-5592.

Z. Li, Q. Zhang, and Y. He, 2022. Modified group theory-based optimization algorithms for numerical optimization. Applied Intelligence, 52(10), pp.11300-11323.

S.S. Shaw, S. Ahmed, S. Malakar, L. Garcia-Hernandez, A. Abraham, and R. Sarkar, 2021. Hybridization of ring theory-based evolutionary algorithm and particle swarm optimization to solve class imbalance problem. Complex & Intelligent Systems, 7(4), pp.2069-2091.

A.S. Desuky, and S. Hussain, 2021. An improved hybrid approach for handling class imbalance problem. Arabian Journal for Science and Engineering, 46, pp.3853-3864.

I.V. Pustokhina, D.A. Pustokhin, P.T. Nguyen, M. Elhoseny, and K. Shankar, 2021. Multi-objective rain optimization algorithm with WELM model for customer churn prediction in telecommunication sector. Complex & Intelligent Systems, pp.1-13.

J.F. Cui, H. Xia, R. Zhang, B.X. Hu, and X.G. Cheng, 2021. Optimization scheme for intrusion detection scheme GBDT in edge computing center. Computer Communications, 168, pp.136-145.

I.V. Pustokhina, D.A. Pustokhin, R.H. Aswathy, T. Jayasankar, C. Jeyalakshmi, V.G. Díaz,. and K. Shankar, 2021. Dynamic customer churn prediction strategy for business intelligence using text analytics with evolutionary optimization algorithms. Information Processing & Management, 58(6), p.102706.

J. Yao, Y. Zheng, and H. Jiang, 2021. An ensemble model for fake online review detection based on data resampling, feature pruning, and parameter optimization. Ieee Access, 9, pp.16914-16927.

Ravi, C., Yasmeen, Y., Masthan, K. ., Tulasi, R. ., Sriveni, D. ., & Shajahan, P. . (2023). A Novel Machine Learning Framework for Tracing Covid Contact Details by Using Time Series Locational data & Prediction Techniques. International Journal on Recent and Innovation Trends in Computing and Communication, 11(2s), 204–211. https://doi.org/10.17762/ijritcc.v11i2s.6046

Rossi, G., Nowak, K., Nielsen, M., García, A., & Silva, J. Enhancing Collaborative Learning in Engineering Education with Machine Learning. Kuwait Journal of Machine Learning, 1(2). Retrieved from http://kuwaitjournals.com/index.php/kjml/article/view/119

Veeraiah, D., Mohanty, R., Kundu, S., Dhabliya, D., Tiwari, M., Jamal, S. S., & Halifa, A. (2022). Detection of malicious cloud bandwidth consumption in cloud computing using machine learning techniques. Computational Intelligence and Neuroscience, 2022 doi:10.1155/2022/4003403

Downloads

Published

21.09.2023

How to Cite

Babu, K. S. ., Rao, B. V. P. ., Rao, Y. N. ., Kiran, J. H. ., Bolem, S. C. ., Battula, S. R. ., Chittepureddi, K. R. ., Kumar, Y. A. ., & Chekka, R. B. . (2023). A Broad Review on Different Imbalanced Dataset Classification Approaches. International Journal of Intelligent Systems and Applications in Engineering, 11(4), 27–40. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/3451

Issue

Section

Research Article