Parameter Analysis of Differential Evolution Based Oversampling Approach for Highly Imbalanced Datasets

Keywords: Imbalanced data learning, Differential evolution, Oversampling, Class imbalance, Parameter Analysis

Abstract

Nowadays, almost all performed activities are saved into databases. Data mining methods such as classifiers utilize these datasets for discovering hidden patterns and rules. Proposed methods for classification problems are generally developed considering approximately balanced datasets. However, imbalanced datasets that have the unequal instance numbers in its classes emerges as a common problem in most real domains. Many approaches at the data level are proposed to enable better classification of imbalanced datasets. Differential Evolution Based Oversampling Approach For Highly Imbalanced Datasets (DEBOHID) is one of the proposed method in order to handle this issue on imbalanced datasets. DEBOHID approach utilizes the crossover and mutation processes of DE for generating new synthetic samples. The parameters used by the crossover and mutation processes affect the solution quality. Therefore, in this study, solution quality in highly imbalanced datasets for different crossover and mutation parameter values of DEBOHID approach is investigated. Experimental studies are carried out by using three classifiers and two evaluation metrics for different parameter values. The obtained results are compared with well-known approaches in the literature.

Downloads

Download data is not yet available.

References

1. Jiang, N. and N. Li, A wind turbine frequent principal fault detection and localization approach with imbalanced data using an improved synthetic oversampling technique. International Journal of Electrical Power & Energy Systems, 2021. 126: p. 106595.
2. Vandewiele, G., et al., Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling. Artificial Intelligence in Medicine, 2021. 111: p. 101987.
3. Zeng, N., et al., A new switching-delayed-PSO-based optimized SVM algorithm for diagnosis of Alzheimer’s disease. Neurocomputing, 2018. 320: p. 195-202.
4. Zeng, N., et al., An improved particle filter with a novel hybrid proposal distribution for quantitative analysis of gold immunochromatographic strips. IEEE Transactions on Nanotechnology, 2019. 18: p. 819-829.
5. Gao, X., et al., Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing, 2016. 173: p. 1927-1935.
6. Zakaryazad, A. and E. Duman, A profit-driven Artificial Neural Network (ANN) with applications to fraud detection and direct marketing. Neurocomputing, 2016. 175: p. 121-131.
7. Kaur, H., H.S. Pannu, and A.K. Malhi, A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR), 2019. 52(4): p. 1-36.
8. Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321-357.
9. Han, H., W.-Y. Wang, and B.-H. Mao. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. in International conference on intelligent computing. 2005. Springer.
10. Bunkhumpornpat, C., K. Sinapiromsaran, and C. Lursinsap. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. in Pacific-Asia conference on knowledge discovery and data mining. 2009. Springer.
11. He, H., et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). 2008. IEEE.
12. Batista, G.E., R.C. Prati, and M.C. Monard, A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 2004. 6(1): p. 20-29.
13. Ramentol, E., et al., SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowledge and information systems, 2012. 33(2): p. 245-265.
14. Tomek, I., Two modifications of CNN. 1976.
15. Pawlak, Z., Rough sets. International journal of computer & information sciences, 1982. 11(5): p. 341-356.
16. Garcia, S., et al., Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems, 2012. 25(1): p. 3-12.
17. Yang, P., et al. A particle swarm based hybrid system for imbalanced medical data sampling. in BMC genomics. 2009. Springer.
18. Wong, G.Y., F.H. Leung, and S.-H. Ling. A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets. in iecon 2013-39th Annual Conference of the ieee Industrial Electronics Society. 2013. IEEE.
19. Eshelman, L.J., The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, in Foundations of genetic algorithms. 1991, Elsevier. p. 265-283.
20. Yu, H., J. Ni, and J. Zhao, ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing, 2013. 101: p. 309-318.
21. Braytee, A., et al. ABC-sampling for balancing imbalanced datasets based on artificial bee colony algorithm. in 2015 IEEE 14th international conference on machine learning and applications (ICMLA). 2015. IEEE.
22. Price, K.V., R.M. Storn, and J.A. Lampinen, The differential evolution algorithm. Differential evolution: a practical approach to global optimization, 2005: p. 37-134.
23. Kaya, E., et al., DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets. Expert Systems with Applications, 2020: p. 114482.
24. Alcalá-Fdez, J., et al., KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 2009. 13(3): p. 307-318.
25. Haixiang, G., et al., Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 2017. 73: p. 220-239.
26. Asuncion, A. and D. Newman, UCI machine learning repository. 2007, Irvine, CA, USA.
27. Alcalá-Fdez, J., et al., Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 2011.
Published
2021-06-30
How to Cite
[1]
M. Sahman, “Parameter Analysis of Differential Evolution Based Oversampling Approach for Highly Imbalanced Datasets”, IJISAE, vol. 9, no. 2, pp. 69-80, Jun. 2021.
Section
Research Article