SFT for Improved Text-to-SQL Translation

Authors

  • Puneet Kumar Ojha B. Tech in Bioinformatics Co-Founder, Attentions Data Labs Pvt. Ltd
  • Abhishek Gautam B. Tech in Computer Science, Data Scientist, Attentions Data Labs Pvt. Ltd, Indian Institute of Information Technology (IIIT) Una
  • Ankit Agrahari Masters in Machine Learning and Artificial Intelligence Co-Founder, Attentions Data Labs Pvt. Ltd, , Liverpool John Moores University
  • Parikshit Singh B. Tech in Computer Science Solution Architect, Attentions Data Labs Pvt. Ltd, United college of engineering Allahabad

Keywords:

SQL, LLM, SFT, Language, Data Analytics, Translation

Abstract

Large Language Models (LLMs) have proved significant proficiency when comes to code generation especially in Structured Query Language (SQL) for databases and recent successful Text-to-SQL method involves fine-tuning pre-trained LLMs for SQL generation tasks. Transforming natural language text into SQL queries, has been attempted to solve with various learning techniques including Few-shot learning[1], fine tuning. In this paper we propose Supervised fine-tuning (SFT) as a better alternative for learning technique for text-to-SQL generation task using Code-Llama that pushes state of art accuracy on spider test suite to 89.6% on dev set which represent first instance of surpassing the earlier best-in-class with 5.5% higher score and 86.8% of exact match accuracy on dev set.  Furthermore, we demonstrate that properly prompted LLM along with SFT provides far fewer hallucinations and much more robust LLM that can be used as a general tool for any text-to-SQL generation use case.

Downloads

Download data is not yet available.

References

Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a Few Examples: A Survey on Few-Shot Learning,” ACM Comput Surv, vol. 53, no. 3, Apr. 2019, doi: 10.1145/3386252.

H. Li, J. Zhang, C. Li, and H. Chen, “RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL,” Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, vol. 37, pp. 13067–13075, Feb. 2023, doi: 10.1609/aaai.v37i11.26535.

T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 9895–9901, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.779.

J. Qi et al., “RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL,” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pp. 3215–3229, May 2022, doi: 10.18653/v1/2022.emnlp-main.211.

T. B. Brown et al., “Language Models are Few-Shot Learners,” Adv Neural Inf Process Syst, vol. 2020-December, May 2020, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2005.14165v4

A. Chowdhery et al., “PaLM: Scaling Language Modeling with Pathways,” Apr. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2204.02311v5

“ChatGPT.” Accessed: Jan. 25, 2024. [Online]. Available: https://chat.openai.com/chat

OpenAI, “GPT-4 Technical Report”.

Google, “PaLM 2 Technical Report”.

J. Wei et al., “Emergent Abilities of Large Language Models,” Jun. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2206.07682v2

M. Chen et al., “Evaluating Large Language Models Trained on Code,” Jul. 2021, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2107.03374v2

A. Liu, X. Hu, L. Wen, and P. S. Yu, “A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability,” Mar. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2303.13547v1

R. Sun et al., “SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL,” May 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2306.00739v3

D. Zhou et al., “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” May 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2205.10625v3

M. Pourreza and D. Rafiei, “DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction,” Apr. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2304.11015v3

J. Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Adv Neural Inf Process Syst, vol. 35, Jan. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2201.11903v6

X. Wang et al., “Self-Consistency Improves Chain of Thought Reasoning in Language Models,” Mar. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2203.11171v4

A. Srivastava et al., “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models,” Jun. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2206.04615v3

T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task,” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pp. 3911–3921, Sep. 2018, doi: 10.18653/v1/d18-1425.

Y. Gan et al., “Towards Robustness of Text-to-SQL Models against Synonym Substitution,” ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 2505–2515, Jun. 2021, doi: 10.18653/v1/2021.acl-long.195.

X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1337–1350, Oct. 2020, doi: 10.18653/v1/2021.naacl-main.105.

Y. Gan, X. Chen, and M. Purver, “Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 8926–8931, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.702.

H. Touvron et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models,” Jul. 2023, Accessed: Jan. 28, 2024. [Online]. Available: https://arxiv.org/abs/2307.09288v2

B. Rozière et al., “Code Llama: Open Foundation Models for Code,” Aug. 2023, Accessed: Jan. 28, 2024. [Online]. Available: https://arxiv.org/abs/2308.12950v2

N. Rajkumar, R. Li, and D. Bahdanau, “Evaluating the Text-to-SQL Capabilities of Large Language Models,” Mar. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2204.00498v1

X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching Large Language Models to Self-Debug,” Apr. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2304.05128v2

T. Yu et al., “GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing,” ICLR 2021 - 9th International Conference on Learning Representations, Sep. 2020, Accessed: Jan. 27, 2024. [Online]. Available: https://arxiv.org/abs/2009.13845v2

Y. Gan et al., “Natural SQL: Making SQL Easier to Infer from Natural Language Specifications,” Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, pp. 2030–2042, Sep. 2021, doi: 10.18653/v1/2021.findings-emnlp.174.

X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1337–1350, 2021, doi: 10.18653/v1/2021.naacl-main.105.

P. Xu et al., “Optimizing Deeper Transformers on Small Datasets”, Accessed: Jan. 27, 2024. [Online]. Available: https://github.com/BorealisAI/DT-Fixup

R. Cao, L. Chen, Z. Chen, Y. Zhao, S. Zhu, and K. Yu, “LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations,” ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 2541–2555, Jun. 2021, doi: 10.18653/v1/2021.acl-long.198.

B. Hui et al., “S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 1254–1262, Mar. 2022, doi: 10.18653/v1/2022.findings-acl.99.

T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 9895–9901, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.779.

J. Qi et al., “RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL,” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pp. 3215–3229, May 2022, doi: 10.18653/v1/2022.emnlp-main.211.

Downloads

Published

23.02.2024

How to Cite

Ojha, P. K. ., Gautam, A. ., Agrahari, A. ., & Singh, P. . (2024). SFT for Improved Text-to-SQL Translation. International Journal of Intelligent Systems and Applications in Engineering, 12(17s), 700–705. Retrieved from https://ijisae.org/index.php/IJISAE/article/view/4938

Issue

Section

Research Article