Automated Classification of Code Review Comments using Deep Neural Network-based Architecture

Authors

  • Gobind Panditrao, Shashank Joshi, Sunita Dhotre, Sandeep Vanjale

Keywords:

Automated classification, Code review comments, CodeBERT, Long Short-Term Memory, Deep Neural Network

Abstract

Code review comments are essential components for automated code review systems that facilitate software quality and productivity of developers. This study demonstrates the classification of code review comments using Deep Neural Networks with a hybrid architecture consisting of CodeBERT and Long Short-Term Memory. Leveraging a dataset from the OpenDev Nova initiative, this study employed a five-class classification model to identify specific types of review comments, like discussions, document changes, and false positives. The approach was modifying and retraining the model already proposed in existing literature and then adapting it to the project environment by restoring the required attributes using the standard libraries. The performance of this modified model was observed across different epochs, with precision, recall, F1-score, and accuracy metrics being utilized to establish its efficiency. The main results indicated major enhancements to the handling of complex comment types as well and overall accuracy compared to previously established models. After analysis, this research supports the viability of Deep Neural Networks in providing a reliable classification system that considers code nuances and contexts. The research also identifies the limitations of the generalizability of the study results due to dataset specificity and suggests possible ways of overcoming this problem, including the use of different neural network architectures and the inclusion of more development environment types in the datasets.

Downloads

Download data is not yet available.

References

R. Tufano, L. Pascarella, M. Tufano, D. Poshyvanyk, and G. Bavota, “Towards automating code review activities,” in 2021 IEEE/ACM 43rd Int. Conf. on Software Engineering (ICSE), pp. 163–174, May 2021. doi:10.1109/icse43902.2021.00027.

Y. Yin, Y. Zhao, Y. Sun, and C. Chen, “Automatic code review by learning the structure information of code graph,” Sensors, vol. 23, no. 5, p. 2551, Feb. 2023. doi:10.3390/s23052551.

F. Ebert, F. Castor, N. Novielli, and A. Serebrenik, “An exploratory study on confusion in code reviews,” Empirical Software Engineering, vol. 26, no. 1, pp. 1–48, Jan. 2021. doi:10.1007/s10664-020-09909-5.

Z. Feng et al., “CodeBERT: A pre-trained model for programming and natural languages,” in Findings of the Association for Computational Linguistics: EMNLP 2020, 2020. doi:10.18653/v1/2020.findings-emnlp.139.

X. Zhou, D. Han, and D. Lo, “Assessing generalizability of CodeBERT,” in 2021 IEEE Int. Conf. on Software Maintenance and Evolution (ICSME), pp. 425–436, Sep. 2021. doi:10.1109/icsme52107.2021.00044.

Z. Li et al., “Automating code review activities by large-scale pre-training,” in Proc. of the 30th ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering, pp. 1035–1047, Nov. 2022. doi:10.1145/3540250.3549081.

A. K. Turzo, “Towards improving code review effectiveness through task automation,” in Proc. of the 37th IEEE/ACM Int. Conf. on Automated Software Engineering, Oct. 2022. doi:10.1145/3551349.3559565.

A. Bosu, J. C. Carver, C. Bird, J. Orbeck, and C. Chockley, “Process Aspects and Social Dynamics of Contemporary Code Review: Insights from open source development and industrial practice at Microsoft,” IEEE Transactions on Software Engineering, vol. 43, no. 1, pp. 56–75, Jan. 2017. doi:10.1109/tse.2016.2576451.

C. Sadowski, E. Söderberg, L. Church, M. Sipko, and A. Bacchelli, “Modern code review: A case study at Google,” in Proc. of the 40th Int. Conf. on Software Engineering: Software Engineering in Practice, pp. 181–190, May 2018. doi:10.1145/3183519.3183525.

A. Bacchelli and C. Bird, “Expectations, outcomes, and challenges of Modern Code Review,” in 2013 35th Int. Conf. on Software Engineering (ICSE), pp. 712–721, May 2013. doi:10.1109/icse.2013.6606617.

M. Barnett, C. Bird, J. Brunet, and S. K. Lahiri, “Helping developers help themselves: Automatic decomposition of Code Review changesets,” in 2015 IEEE/ACM 37th IEEE Int. Conf. on Software Engineering, pp. 134–144, May 2015. doi:10.1109/icse.2015.35.

E. Fregnan, F. Petrulio, L. Di Geronimo, and A. Bacchelli, “What happens in my code reviews? An investigation on automatically classifying review changes,” Empirical Software Engineering, vol. 27, no. 4, p. 89, Apr. 2022. doi:10.1007/s10664-021-10075-5.

A. K. Turzo et al., “Towards automated classification of code review feedback to support analytics,” in 2023 ACM/IEEE Int. Symp. on Empirical Software Engineering and Measurement (ESEM), Oct. 2023. doi:10.1109/esem56168.2023.10304851.

A. K. Turzo and A. Bosu, “What makes a code review useful to OpenDev developers? An empirical investigation,” Empirical Software Engineering, vol. 29, no. 1, p. 6, Nov. 2023. doi:10.1007/s10664-023-10411-x.

J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,” Biometrics, vol. 33, no. 1, p. 159, Mar. 1977. doi:10.2307/2529310.

T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed, vol. 2. New York: Springer, 2009, pp. 1–758.

Y. Zhang, D. Lo, X. Xia, and J. Sun, “An empirical study of classifier combination for cross-project defect prediction,” in 2015 IEEE 39th Annu. Computer Software and Applications Conf., vol. 2, pp. 264–269, Jul. 2015. doi:10.1109/compsac.2015.58.

E. Doğan and E. Tüzün, “Towards a taxonomy of code review smells,” Information and Software Technology, vol. 142, p. 106737, Feb. 2022. doi:10.1016/j.infsof.2021.106737.

T. J. McCabe, “A complexity measure,” IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp. 308–320, Dec. 1976. doi:10.1109/tse.1976.233837.

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997. doi:10.1162/neco.1997.9.8.1735.

D. P. Kingma and J. Ba, “Adam: A method for Stochastic Optimization,” in Proc. of the 3rd Int. Conf. for Learning Representations (ICLR 2015), Dec. 2014. doi:10.48550/arXiv.1412.6980.

A. Tharwat, “Classification assessment methods,” Applied Computing and Informatics, vol. 17, no. 1, pp. 168–192, Jul. 2020. doi:10.1016/j.aci.2018.08.003.

M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Information Processing & Management, vol. 45, no. 4, pp. 427–437, Jul. 2009. doi:10.1016/j.ipm.2009.03.002.

D. M. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness & correlation,” International Journal of Machine Learning Technology, vol. 2, no. 1, pp. 37–63, Oct. 2020. doi:10.48550/arXiv.2010.16061.

P. C. Rigby, D. M. German, L. Cowen, and M.-A. Storey, “Peer review on open-source software projects,” ACM Transactions on Software Engineering and Methodology, vol. 23, no. 4, pp. 1–33, Sep. 2014. doi:10.1145/2594458.

J. Zhang et al., “A novel neural source code representation based on Abstract Syntax Tree,” in 2019 IEEE/ACM 41st Int. Conf. on Software Engineering (ICSE), pp. 783–794, May 2019. doi:10.1109/icse.2019.00086.

J. Bergstra and Y. Bengio, “Random Search for Hyper-Parameter Optimization,” The Journal of Machine Learning Research, vol. 13, pp. 281–305, Feb. 2012. doi:10.5555/2188385.2188395.

M. Kukar and I. Kononenko, “Cost-sensitive learning with neural networks,” in Proc. of the 13th European Conf. on Artificial Intelligence (ECAI 98), vol. 15, no. 27, pp. 88–94, Aug. 1998.

K. Liu, G. Yang, X. Chen, and Y. Zhou, “EL-CodeBERT: Better exploiting CodeBERT to support source code-related classification tasks,” in Proc. of the 13th Asia-Pacific Symp. on Internetware, pp. 147–155, Jun. 2022. doi:10.1145/3545258.3545260.

A. Vaswani et al., “Attention is All you need,” in Proc. of the 31st Int. Conf. on Neural Information Processing Systems (NIPS 2017), Jun. 2017. doi:10.48550/arXiv.1706.03762.

H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning,” in 2008 IEEE Int. Joint Conf. on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328, Jun. 2008. doi:10.1109/ijcnn.2008.4633969.

Downloads

Published

02.06.2024

How to Cite

Gobind Panditrao. (2024). Automated Classification of Code Review Comments using Deep Neural Network-based Architecture. International Journal of Intelligent Systems and Applications in Engineering, 12(3), 4122–4134. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/6116

Issue

Section

Research Article