Parameter Analysis of Differential Evolution Based Oversampling Approach for Highly Imbalanced Datasets
AbstractNowadays, almost all performed activities are saved into databases. Data mining methods such as classifiers utilize these datasets for discovering hidden patterns and rules. Proposed methods for classification problems are generally developed considering approximately balanced datasets. However, imbalanced datasets that have the unequal instance numbers in its classes emerges as a common problem in most real domains. Many approaches at the data level are proposed to enable better classification of imbalanced datasets. Differential Evolution Based Oversampling Approach For Highly Imbalanced Datasets (DEBOHID) is one of the proposed method in order to handle this issue on imbalanced datasets. DEBOHID approach utilizes the crossover and mutation processes of DE for generating new synthetic samples. The parameters used by the crossover and mutation processes affect the solution quality. Therefore, in this study, solution quality in highly imbalanced datasets for different crossover and mutation parameter values of DEBOHID approach is investigated. Experimental studies are carried out by using three classifiers and two evaluation metrics for different parameter values. The obtained results are compared with well-known approaches in the literature.
2. Vandewiele, G., et al., Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling. Artificial Intelligence in Medicine, 2021. 111: p. 101987.
3. Zeng, N., et al., A new switching-delayed-PSO-based optimized SVM algorithm for diagnosis of Alzheimer’s disease. Neurocomputing, 2018. 320: p. 195-202.
4. Zeng, N., et al., An improved particle filter with a novel hybrid proposal distribution for quantitative analysis of gold immunochromatographic strips. IEEE Transactions on Nanotechnology, 2019. 18: p. 819-829.
5. Gao, X., et al., Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing, 2016. 173: p. 1927-1935.
6. Zakaryazad, A. and E. Duman, A profit-driven Artificial Neural Network (ANN) with applications to fraud detection and direct marketing. Neurocomputing, 2016. 175: p. 121-131.
7. Kaur, H., H.S. Pannu, and A.K. Malhi, A systematic review on imbalanced data challenges in machine learning: Applications and solutions. ACM Computing Surveys (CSUR), 2019. 52(4): p. 1-36.
8. Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321-357.
9. Han, H., W.-Y. Wang, and B.-H. Mao. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. in International conference on intelligent computing. 2005. Springer.
10. Bunkhumpornpat, C., K. Sinapiromsaran, and C. Lursinsap. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. in Pacific-Asia conference on knowledge discovery and data mining. 2009. Springer.
11. He, H., et al. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. in 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). 2008. IEEE.
12. Batista, G.E., R.C. Prati, and M.C. Monard, A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 2004. 6(1): p. 20-29.
13. Ramentol, E., et al., SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowledge and information systems, 2012. 33(2): p. 245-265.
14. Tomek, I., Two modifications of CNN. 1976.
15. Pawlak, Z., Rough sets. International journal of computer & information sciences, 1982. 11(5): p. 341-356.
16. Garcia, S., et al., Evolutionary-based selection of generalized instances for imbalanced classification. Knowledge-Based Systems, 2012. 25(1): p. 3-12.
17. Yang, P., et al. A particle swarm based hybrid system for imbalanced medical data sampling. in BMC genomics. 2009. Springer.
18. Wong, G.Y., F.H. Leung, and S.-H. Ling. A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets. in iecon 2013-39th Annual Conference of the ieee Industrial Electronics Society. 2013. IEEE.
19. Eshelman, L.J., The CHC adaptive search algorithm: How to have safe search when engaging in nontraditional genetic recombination, in Foundations of genetic algorithms. 1991, Elsevier. p. 265-283.
20. Yu, H., J. Ni, and J. Zhao, ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data. Neurocomputing, 2013. 101: p. 309-318.
21. Braytee, A., et al. ABC-sampling for balancing imbalanced datasets based on artificial bee colony algorithm. in 2015 IEEE 14th international conference on machine learning and applications (ICMLA). 2015. IEEE.
22. Price, K.V., R.M. Storn, and J.A. Lampinen, The differential evolution algorithm. Differential evolution: a practical approach to global optimization, 2005: p. 37-134.
23. Kaya, E., et al., DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets. Expert Systems with Applications, 2020: p. 114482.
24. Alcalá-Fdez, J., et al., KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 2009. 13(3): p. 307-318.
25. Haixiang, G., et al., Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 2017. 73: p. 220-239.
26. Asuncion, A. and D. Newman, UCI machine learning repository. 2007, Irvine, CA, USA.
27. Alcalá-Fdez, J., et al., Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing, 2011.
Copyright (c) 2021 Mehmet Akif Sahman
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.