Oral
in
Workshop: Table Representation Learning Workshop (TRL)
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models
Karime Maamari · Fadhil Abubaker · Daniel Jaroslawicz · Amine Mhedhbi
Keywords: [ Natural Language Interfaces to Databases ] [ BIRD Benchmark ] [ Schema Linking ] [ Text-to-SQL ]
In Text-to-SQL pipelines, schema linking is used to retrieve tables and columns that are relevant to the user's natural language query. However, inaccuracies in schema linking can lead to the exclusion of crucial information, which in turn adversely affects SQL generation. In this work, we revisit the need for schema linking when using the latest generation of large language models (LLMs). We find that newer models can accurately identify relevant schema during SQL generation, even in the presence of substantial irrelevant data. Consequently, our Text-to-SQL pipeline forgoes schema linking when the entire database schema fits within the model's context window. This approach eliminates errors due to faulty schema linking by ensuring that no schema information is omitted. Furthermore, we introduce techniques such as augmentation, selection, and correction, which improve Text-to-SQL accuracy without the risk of filtering out essential schema information. Our approach ranks first on the BIRD benchmark, achieving an accuracy of 71.83%.