Talk
in
Workshop: Table Representation Learning
Huan Sun - "Self-supervised Pre-training on Tables"
Huan Sun
Pre-training/fine-tuning paradigms have transformed the natural language processing field. For table-based tasks, however, their potential has been far less explored. In this talk, I will discuss the recent efforts led by my Ph.D. student Xiang Deng: (1) TURL, a pre-training/fine-tuning paradigm on relational Web tables, which benefits a wide range of tasks for table understanding (e.g., row population, relation extraction, entity linking). This work won the ACM SIGMOD Research Highlight Award in 2022. (2) StruG, a weakly supervised Structure-Grounded pretraining framework for text-to-SQL, which effectively learns to capture the text-table alignment essential for the task. At the time we tested our model on the Spider leaderboard in 2020, it was ranked 6th under the setting using DB content and 1st if without using DB content. (3) ReasonBERT, a pre-training method that augments language models for multi-step reasoning over hybrid contexts (textual and tabular). Among them, I will cover TURL in greater detail. Finally, I will conclude the talk with my thoughts about promising future directions.