Automatic Program Repair: A Comparative Study of LLMs on QuixBugs

Authors

  • Poonam Ponde, Manisha Bharambe, Vinaya Keskar, Harshita Vachhani

Keywords:

Bugs, Debugging, Automatic Program Repair, ChatGPT, Gemini.

Abstract

Software bugs are errors or flaws in a program's code that can lead to incorrect or unexpected behavior, making their detection and resolution crucial for reliable and secure software development. Debugging is a human-centric, time-consuming and resource-intensive process, making it one of the most expensive phases in software development. Automatic Program Repair (APR) is an emerging area of research that aims to automatically fix software bugs with minimal human intervention. Traditional APR tools use search-based or learning-based techniques to find software bugs based on test suites and bug patterns, thereby having heavy reliance on test cases. AI-driven APR tools are trained on large-scale codebases, open-source bug-fix histories, and benchmarks like QuixBugs.  They can analyze buggy code, fix bugs and generate code patches that are syntactically and semantically correct. This reduces the debugging time and improves software reliability The QuixBugs benchmark has 40 programs from the Quixey Challenge in two languages: Python and Java. Each program contains a one-line defect and failing testcases. This paper presents a comparative study of APR techniques on the QuixBugs benchmark, which includes 40 buggy programs in both Python and Java. This study evaluates and compares the automatic bug fixing capability of LLMs such as ChatGPT and Google Gemini on the QuixBugs benchmark, thereby contributing to the understanding of LLMs’ role in automatic program repair.

Downloads

Download data is not yet available.

References

Fan, Y., Wang, S., Liu, Y., & Zhang, L. (2023). Towards generalizable program repair with large language models: An empirical study. Proceedings of the 45th International Conference on Software Engineering.

Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., ... & Zhou, M. (2020). CodeBERT: A pre-trained model for programming and natural languages. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1536–1547.

Finnie-Ansley, J., Sivaraman, A., Vasilescu, B., & DeLine, R. (2023). Robots need social skills: Exploring social behavior in code generation tools. IEEE Transactions on Software Engineering.

Jiang, J., Zhang, D., Wang, S., Yin, G., & Zhou, J. (2021). CURE: Code-aware neural machine translation for automatic program repair. IEEE Transactions on Software Engineering.

Just, R., Jalali, D., & Ernst, M. D. (2014). Defects4J: A database of existing faults to enable controlled testing studies for Java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis (ISSTA) (pp. 437–440). ACM. https://doi.org/10.1145/2610384.2628055

Le Goues, C., Nguyen, T., Forrest, S., & Weimer, W. (2012). GenProg: A generic method for automatic software repair. IEEE Transactions on Software Engineering, 38(1), 54–72.

Lin, D., Koppel, J., Chen, A., & Solar-Lezama, A. (2017). QuixBugs: A Multi-Lingual Program Repair Benchmark Set Based on the Quixey Challenge. SPLASH Companion 2017. https://doi.org/10.1145/3135932.3135941. GitHub. https://github.com/jkoppel/QuixBugs

Martinez, M., & Monperrus, M. (2019). Astor: A program repair library for Java. Proceedings of ISSTA, ACM.

Mechtaev, S., Yi, J., & Roychoudhury, A. (2016). Angelix: Scalable multiline program patch synthesis via symbolic analysis. Proceedings of the 38th International Conference on Software Engineering, 691–701.

Monperrus, M. (2018). Automatic software repair: A bibliography. ACM Computing Surveys (CSUR), 51(1), 1–24.

Nguyen, H. D. T., Qi, D., Roychoudhury, A., & Chandra, S. (2013). SemFix: Program repair via semantic analysis. Proceedings of the 2013 International Conference on Software Engineering, 772–781.

Prenner, J. A., Babii, H., & Robbes, R. (2022). Can OpenAI's Codex Fix Bugs? An Evaluation on QuixBugs. International Workshop on Automated Program Repair (APR’22). https://doi.org/10.1145/3524459.3527351

Sobania, D., Briesch, M., Hanna, C., & Petke, J. (2023). An analysis of the automatic bug fixing performance of chatgpt. In 2023 IEEE/ACM International Workshop on Automated Program Repair (APR) (pp. 23-30). IEEE.

Wuisang, M. C., Kurniawan, M., Santosa, K. A. W., Gunawan, A. A. S., & Saputra, K. E. (2023). An Evaluation of the Effectiveness of OpenAI's ChatGPT for Automated Python Program Bug Fixing Using QuixBugs. 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), IEEE. https://doi.org/10.1109/iSemantic59612.2023.10295323

Xia, C. S., Wei, Y., & Zhang, L. (2023). Automated program repair in the era of large pre-trained language models. Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), 1482–1494. https://doi.org/10.1109/ICSE48619.2023.00129

Xuan, J., Martinez, M., Demarco, F., Clement, M., Danglot, B., Le Berre, D., & Monperrus, M. (2017). Nopol: Automatic repair of conditional statements in Java programs. IEEE Transactions on Software Engineering, 43(1), 34–55.

Ye, H., Martinez, M., Durieux, T., & Monperrus, M. (2020). A comprehensive study of automatic program repair on the QuixBugs benchmark. Journal of Systems and Software, 171, 110825. https://doi.org/10.1016/j.jss.2020.110825

Zhang, D., Liu, Y., Wang, S., & Zhou, J. (2023). A survey of learning-based automated program repair. ACM Computing Surveys (CSUR), 55(9), 1–39.

Downloads

Published

26.12.2024

How to Cite

Poonam Ponde. (2024). Automatic Program Repair: A Comparative Study of LLMs on QuixBugs. International Journal of Intelligent Systems and Applications in Engineering, 12(23s), 3381 –. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/7707

Issue

Section

Research Article