Poster
in
Workshop: Table Representation Learning Workshop (TRL)
RES-RAG: Residual-aware RAG for Realistic Tabular Data Generation
Liancheng Fang · Aiwei Liu · Hengrui Zhang · Henry Zou · Weizhi Zhang · Philip S Yu
Keywords: [ RAG ] [ Tabular data generation ]
LLM-based synthetic data generation has been widely explored across various domains, while its application to tabular data generation remains relatively underexplored. This paper examines whether a pretrained LLM can generate realistic tabular data through effective prompting. We introduce \modelname, a residual-aware prompting framework designed for tabular data generation. In each iteration, RES-RAG retrieves a subset of real samples that acts as a \textit{residual} between the currently generated samples and the true data. This approach 1) facilitates more effective in-context learning examples for the LLM in each iteration and 2) progressively narrows the gap between the generated samples and the real data distribution. Extensive experiments on five real-world tabular datasets show that \modelname significantly improves the quality of generated samples. This is the first work to demonstrate that prompting a fixed LLM can yield high-quality synthetic tabular data. The code is available at \url{https://anonymous.4open.science/r/RES-RAG-3E16}.