Enhancing Software Defect Projections Performance by Class Rebalancing

Authors

  • Ranjeetsingh Suryawanshi Bharati Vidyapeeth Deemed To Be University, College of Engineering, India.
  • Amol Kadam Bharati Vidyapeeth Deemed To Be University, College of Engineering, India.

Keywords:

Class Imbalance, Cost sensitive learning, Median absolute deviation, Sampling methods, Software Defect Prediction

Abstract

Data is the most precious asset to any tech firm since it has ability to totally revolutionize an industry. However, great availability does not imply perfection; there are numerous issues with the current quality of publicly available datasets. Class imbalance is one such problem, it basically means that one class label appears more frequently than others, and it occurs practically everywhere, such as in medical diagnosis, natural language processing, fraud detection, and so on. This issue is extremely serious since it undermines the potential of existing machine learning models by lowering their ability to identify newer data that has never been seen before. Many strategies have been proposed to address this problem, including oversampling and under sampling. These procedures, however, either overgeneralize or over diversify the data points, making them unproductive. The ensemble technique is another solution with good outcomes but high complexity. However, to address this challenge more effectively, we propose a model that uses mean absolute deviation to find the best features that have the most impact on the outcome, thus lowering dimensionality, and the adaptive synthetic data creation technique to balance the data. The model prioritizes recall over accuracy, which is a crucial parameter when dealing with data imbalances. The findings of this model appear to be quite promising, with an accuracy of nearly 85% and a recall of 89%. The model's AUC curve is 0.93, demonstrating the model's ability to correctly detect positive and negative instances.  

Downloads

Download data is not yet available.

References

S. Feng et al., “COSTE: Complexity-based OverSampling TEchnique to alleviate the class imbalance problem in software defect prediction,” Inf. Softw. Technol., vol. 129, no. April, p. 106432, 2021, doi: 10.1016/j.infsof.2020.106432.

R. Malhotra and J. Jain, “Predicting defects in imbalanced data using resampling methods: An empirical investigation,” PeerJ Comput. Sci., vol. 8, pp. 1–34, 2022, doi: 10.7717/peerj-cs.573.

Z. Li, M. Huang, G. Liu, and C. Jiang, “A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection,” Expert Syst. Appl., vol. 175, no. July 2020, p. 114750, 2021, doi: 10.1016/j.eswa.2021.114750.

S. S. Rathore and S. Kumar, “Software fault prediction based on the dynamic selection of learning technique: findings from the eclipse project study,” Appl. Intell., vol. 51, no. 12, pp. 8945–8960, 2021, doi: 10.1007/s10489-021-02346-x.

C. L. Prabha and N. Shivakumar, “Software Defect Prediction Using Machine Learning Techniques,” Proc. 4th Int. Conf. Trends Electron. Informatics, ICOEI 2020, pp. 728–733, 2020, doi: 10.1109/ICOEI48184.2020.9142909.

M. Wasikowski and X. W. Chen, “Combating the small sample class imbalance problem using feature selection,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1388–1400, 2010, doi: 10.1109/TKDE.2009.187.

K. K. Bejjanki, J. Gyani, and N. Gugulothu, “Class imbalance reduction (CIR): A novel approach to software defect prediction in the presence of class imbalance,” Symmetry (Basel)., vol. 12, no. 3, 2020, doi: 10.3390/sym12030407.

S. K. Pandey and A. K. Tripathi, “An empirical study toward dealing with noise and class imbalance issues in software defect prediction,” Soft Comput., vol. 25, no. 21, pp. 13465–13492, 2021, doi: 10.1007/s00500-021-06096-3.

S. Pandey and K. Kumar, “Software Fault Prediction for Imbalanced Data: A Survey on Recent Developments,” Procedia Comput. Sci., vol. 218, pp. 1815–1824, 2022, doi: 10.1016/j.procs.2023.01.159.

N. A. A. Khleel and K. Nehéz, “A novel approach for software defect prediction using CNN and GRU based on SMOTE Tomek method,” J. Intell. Inf. Syst., vol. 60, no. 3, pp. 673–707, 2023, doi: 10.1007/s10844-023-00793-1.

M. Tan, L. Tan, S. Dara, and C. Mayeux, “Online Defect Prediction for Imbalanced Data,” Proc. - Int. Conf. Softw. Eng., vol. 2, pp. 99–108, 2015, doi: 10.1109/ICSE.2015.139.

H. Shi, J. Ai, J. Liu, and J. Xu, “Improving Software Defect Prediction in Noisy Imbalanced Datasets,” Appl. Sci., vol. 13, no. 18, 2023, doi: 10.3390/app131810466.

S. Goyal, “Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction,” Artificial Intelligence Review, vol. 55, no. 3. pp. 2023–2064, 2022. doi: 10.1007/s10462-021-10044-w.

T. A. Ruchika Malhotra, Vaibhav Agrawal, Vedansh Pal, “Support Vector based Oversampling Technique for Handling Class Imbalance in Software Defect Prediction,” 2021 11th Int. Conf. Cloud Comput. Data Sci. Eng., vol. 978-1–6654, pp. 1078–1083, 2021.

N. A. A. Khleel and K. Nehéz, “Software defect prediction using a bidirectional LSTM network combined with oversampling techniques,” Cluster Comput., vol. 0123456789, 2023, doi: 10.1007/s10586-023-04170-z.

R. Malhotra and J. Jain, “Predicting defects in object-oriented software using cost-sensitive classification,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1022, no. 1, 2021, doi: 10.1088/1757-899X/1022/1/012112.

N. Japkowicz and S. Stephen, “The class imbalance problem A systematic study fulltext.pdf,” vol. 6, pp. 429–449, 2002.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique Nitesh,” J. Artif. Intell. Res., vol. 16, no. Sept. 28, pp. 321–357, 2002, [Online]. Available: https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp

H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2009, doi: 10.1109/TKDE.2008.239.

S. He, H., Bai, Y., Garcia, E., & Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” IJCNN 2008.(IEEE World Congr. Comput. Intell. (pp. 1322– 1328), no. 3, pp. 1322– 1328, 2008.

A. Balaram and S. Vasundra, “Sampling-based Software Prone Technique for an Optimal Prediction of Software Faults,” Indian J. Comput. Sci. Eng., vol. 13, no. 4, pp. 981–991, 2022, doi: 10.21817/indjcse/2022/v13i4/221304009.

S. Wang and X. Yao, “Diversity analysis on imbalanced data sets by using ensemble models,” 2009 IEEE Symp. Comput. Intell. Data Mining, CIDM 2009 - Proc., pp. 324–331, 2009, doi: 10.1109/CIDM.2009.4938667.

Promise Software Engineering Repository. Available online: http://promise.site.uottawa.ca/SERepository/datasets-page.html

Downloads

Published

12.01.2024

How to Cite

Suryawanshi, R. ., & Kadam, A. . (2024). Enhancing Software Defect Projections Performance by Class Rebalancing . International Journal of Intelligent Systems and Applications in Engineering, 12(12s), 183–191. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4503

Issue

Section

Research Article