SFT for Improved Text-to-SQL Translation
Keywords:
SQL, LLM, SFT, Language, Data Analytics, TranslationAbstract
Large Language Models (LLMs) have proved significant proficiency when comes to code generation especially in Structured Query Language (SQL) for databases and recent successful Text-to-SQL method involves fine-tuning pre-trained LLMs for SQL generation tasks. Transforming natural language text into SQL queries, has been attempted to solve with various learning techniques including Few-shot learning[1], fine tuning. In this paper we propose Supervised fine-tuning (SFT) as a better alternative for learning technique for text-to-SQL generation task using Code-Llama that pushes state of art accuracy on spider test suite to 89.6% on dev set which represent first instance of surpassing the earlier best-in-class with 5.5% higher score and 86.8% of exact match accuracy on dev set. Furthermore, we demonstrate that properly prompted LLM along with SFT provides far fewer hallucinations and much more robust LLM that can be used as a general tool for any text-to-SQL generation use case.
Downloads
References
Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a Few Examples: A Survey on Few-Shot Learning,” ACM Comput Surv, vol. 53, no. 3, Apr. 2019, doi: 10.1145/3386252.
H. Li, J. Zhang, C. Li, and H. Chen, “RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL,” Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, vol. 37, pp. 13067–13075, Feb. 2023, doi: 10.1609/aaai.v37i11.26535.
T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 9895–9901, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.779.
J. Qi et al., “RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL,” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pp. 3215–3229, May 2022, doi: 10.18653/v1/2022.emnlp-main.211.
T. B. Brown et al., “Language Models are Few-Shot Learners,” Adv Neural Inf Process Syst, vol. 2020-December, May 2020, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2005.14165v4
A. Chowdhery et al., “PaLM: Scaling Language Modeling with Pathways,” Apr. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2204.02311v5
“ChatGPT.” Accessed: Jan. 25, 2024. [Online]. Available: https://chat.openai.com/chat
OpenAI, “GPT-4 Technical Report”.
Google, “PaLM 2 Technical Report”.
J. Wei et al., “Emergent Abilities of Large Language Models,” Jun. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2206.07682v2
M. Chen et al., “Evaluating Large Language Models Trained on Code,” Jul. 2021, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2107.03374v2
A. Liu, X. Hu, L. Wen, and P. S. Yu, “A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability,” Mar. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2303.13547v1
R. Sun et al., “SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL,” May 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2306.00739v3
D. Zhou et al., “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” May 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2205.10625v3
M. Pourreza and D. Rafiei, “DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction,” Apr. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2304.11015v3
J. Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Adv Neural Inf Process Syst, vol. 35, Jan. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2201.11903v6
X. Wang et al., “Self-Consistency Improves Chain of Thought Reasoning in Language Models,” Mar. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2203.11171v4
A. Srivastava et al., “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models,” Jun. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2206.04615v3
T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task,” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pp. 3911–3921, Sep. 2018, doi: 10.18653/v1/d18-1425.
Y. Gan et al., “Towards Robustness of Text-to-SQL Models against Synonym Substitution,” ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 2505–2515, Jun. 2021, doi: 10.18653/v1/2021.acl-long.195.
X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1337–1350, Oct. 2020, doi: 10.18653/v1/2021.naacl-main.105.
Y. Gan, X. Chen, and M. Purver, “Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 8926–8931, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.702.
H. Touvron et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models,” Jul. 2023, Accessed: Jan. 28, 2024. [Online]. Available: https://arxiv.org/abs/2307.09288v2
B. Rozière et al., “Code Llama: Open Foundation Models for Code,” Aug. 2023, Accessed: Jan. 28, 2024. [Online]. Available: https://arxiv.org/abs/2308.12950v2
N. Rajkumar, R. Li, and D. Bahdanau, “Evaluating the Text-to-SQL Capabilities of Large Language Models,” Mar. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2204.00498v1
X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching Large Language Models to Self-Debug,” Apr. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2304.05128v2
T. Yu et al., “GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing,” ICLR 2021 - 9th International Conference on Learning Representations, Sep. 2020, Accessed: Jan. 27, 2024. [Online]. Available: https://arxiv.org/abs/2009.13845v2
Y. Gan et al., “Natural SQL: Making SQL Easier to Infer from Natural Language Specifications,” Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, pp. 2030–2042, Sep. 2021, doi: 10.18653/v1/2021.findings-emnlp.174.
X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1337–1350, 2021, doi: 10.18653/v1/2021.naacl-main.105.
P. Xu et al., “Optimizing Deeper Transformers on Small Datasets”, Accessed: Jan. 27, 2024. [Online]. Available: https://github.com/BorealisAI/DT-Fixup
R. Cao, L. Chen, Z. Chen, Y. Zhao, S. Zhu, and K. Yu, “LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations,” ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 2541–2555, Jun. 2021, doi: 10.18653/v1/2021.acl-long.198.
B. Hui et al., “S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 1254–1262, Mar. 2022, doi: 10.18653/v1/2022.findings-acl.99.
T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 9895–9901, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.779.
J. Qi et al., “RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL,” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pp. 3215–3229, May 2022, doi: 10.18653/v1/2022.emnlp-main.211.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in the Journal unless they receive approval for doing so from the Editor-In-Chief.
IJISAE open access articles are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.