SFT for Improved Text-to-SQL Translation

Puneet Kumar  Ojha; Abhishek  Gautam; Ankit  Agrahari; Parikshit  Singh

Authors

Puneet Kumar Ojha B. Tech in Bioinformatics Co-Founder, Attentions Data Labs Pvt. Ltd
Abhishek Gautam B. Tech in Computer Science, Data Scientist, Attentions Data Labs Pvt. Ltd, Indian Institute of Information Technology (IIIT) Una
Ankit Agrahari Masters in Machine Learning and Artificial Intelligence Co-Founder, Attentions Data Labs Pvt. Ltd, , Liverpool John Moores University
Parikshit Singh B. Tech in Computer Science Solution Architect, Attentions Data Labs Pvt. Ltd, United college of engineering Allahabad

Keywords:

SQL, LLM, SFT, Language, Data Analytics, Translation

Abstract

Large Language Models (LLMs) have proved significant proficiency when comes to code generation especially in Structured Query Language (SQL) for databases and recent successful Text-to-SQL method involves fine-tuning pre-trained LLMs for SQL generation tasks. Transforming natural language text into SQL queries, has been attempted to solve with various learning techniques including Few-shot learning[1], fine tuning. In this paper we propose Supervised fine-tuning (SFT) as a better alternative for learning technique for text-to-SQL generation task using Code-Llama that pushes state of art accuracy on spider test suite to 89.6% on dev set which represent first instance of surpassing the earlier best-in-class with 5.5% higher score and 86.8% of exact match accuracy on dev set. Furthermore, we demonstrate that properly prompted LLM along with SFT provides far fewer hallucinations and much more robust LLM that can be used as a general tool for any text-to-SQL generation use case.

Downloads

Download data is not yet available.

References

Y. Wang, Q. Yao, J. T. Kwok, and L. M. Ni, “Generalizing from a Few Examples: A Survey on Few-Shot Learning,” ACM Comput Surv, vol. 53, no. 3, Apr. 2019, doi: 10.1145/3386252.

H. Li, J. Zhang, C. Li, and H. Chen, “RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL,” Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023, vol. 37, pp. 13067–13075, Feb. 2023, doi: 10.1609/aaai.v37i11.26535.

T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 9895–9901, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.779.

J. Qi et al., “RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL,” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pp. 3215–3229, May 2022, doi: 10.18653/v1/2022.emnlp-main.211.

T. B. Brown et al., “Language Models are Few-Shot Learners,” Adv Neural Inf Process Syst, vol. 2020-December, May 2020, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2005.14165v4

A. Chowdhery et al., “PaLM: Scaling Language Modeling with Pathways,” Apr. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2204.02311v5

“ChatGPT.” Accessed: Jan. 25, 2024. [Online]. Available: https://chat.openai.com/chat

OpenAI, “GPT-4 Technical Report”.

Google, “PaLM 2 Technical Report”.

J. Wei et al., “Emergent Abilities of Large Language Models,” Jun. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2206.07682v2

M. Chen et al., “Evaluating Large Language Models Trained on Code,” Jul. 2021, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2107.03374v2

A. Liu, X. Hu, L. Wen, and P. S. Yu, “A comprehensive evaluation of ChatGPT’s zero-shot Text-to-SQL capability,” Mar. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2303.13547v1

R. Sun et al., “SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL,” May 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2306.00739v3

D. Zhou et al., “Least-to-Most Prompting Enables Complex Reasoning in Large Language Models,” May 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2205.10625v3

M. Pourreza and D. Rafiei, “DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction,” Apr. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2304.11015v3

J. Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” Adv Neural Inf Process Syst, vol. 35, Jan. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2201.11903v6

X. Wang et al., “Self-Consistency Improves Chain of Thought Reasoning in Language Models,” Mar. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2203.11171v4

A. Srivastava et al., “Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models,” Jun. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2206.04615v3

T. Yu et al., “Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task,” Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018, pp. 3911–3921, Sep. 2018, doi: 10.18653/v1/d18-1425.

Y. Gan et al., “Towards Robustness of Text-to-SQL Models against Synonym Substitution,” ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 2505–2515, Jun. 2021, doi: 10.18653/v1/2021.acl-long.195.

X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1337–1350, Oct. 2020, doi: 10.18653/v1/2021.naacl-main.105.

Y. Gan, X. Chen, and M. Purver, “Exploring Underexplored Limitations of Cross-Domain Text-to-SQL Generalization,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 8926–8931, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.702.

H. Touvron et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models,” Jul. 2023, Accessed: Jan. 28, 2024. [Online]. Available: https://arxiv.org/abs/2307.09288v2

B. Rozière et al., “Code Llama: Open Foundation Models for Code,” Aug. 2023, Accessed: Jan. 28, 2024. [Online]. Available: https://arxiv.org/abs/2308.12950v2

N. Rajkumar, R. Li, and D. Bahdanau, “Evaluating the Text-to-SQL Capabilities of Large Language Models,” Mar. 2022, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2204.00498v1

X. Chen, M. Lin, N. Schärli, and D. Zhou, “Teaching Large Language Models to Self-Debug,” Apr. 2023, Accessed: Jan. 25, 2024. [Online]. Available: https://arxiv.org/abs/2304.05128v2

T. Yu et al., “GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing,” ICLR 2021 - 9th International Conference on Learning Representations, Sep. 2020, Accessed: Jan. 27, 2024. [Online]. Available: https://arxiv.org/abs/2009.13845v2

Y. Gan et al., “Natural SQL: Making SQL Easier to Infer from Natural Language Specifications,” Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, pp. 2030–2042, Sep. 2021, doi: 10.18653/v1/2021.findings-emnlp.174.

X. Deng, A. H. Awadallah, C. Meek, O. Polozov, H. Sun, and M. Richardson, “Structure-Grounded Pretraining for Text-to-SQL,” NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, pp. 1337–1350, 2021, doi: 10.18653/v1/2021.naacl-main.105.

P. Xu et al., “Optimizing Deeper Transformers on Small Datasets”, Accessed: Jan. 27, 2024. [Online]. Available: https://github.com/BorealisAI/DT-Fixup

R. Cao, L. Chen, Z. Chen, Y. Zhao, S. Zhu, and K. Yu, “LGESQL: Line Graph Enhanced Text-to-SQL Model with Mixed Local and Non-Local Relations,” ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 2541–2555, Jun. 2021, doi: 10.18653/v1/2021.acl-long.198.

B. Hui et al., “S$^2$SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers,” Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 1254–1262, Mar. 2022, doi: 10.18653/v1/2022.findings-acl.99.

T. Scholak, N. Schucher, and D. Bahdanau, “PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models,” EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings, pp. 9895–9901, Sep. 2021, doi: 10.18653/v1/2021.emnlp-main.779.

J. Qi et al., “RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL,” Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pp. 3215–3229, May 2022, doi: 10.18653/v1/2022.emnlp-main.211.

SFT for Improved Text-to-SQL Translation

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

ijisae

Information

Indexed By

SFT for Improved Text-to-SQL Translation

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Announcements

Information for Authors

Like, Subscribe and Share This Video

ijisae

Information

Indexed By