Workshop
Table Representation Learning Workshop (TRL)
Madelon Hulsebos · Haoyu Dong · Laurel Orr · Qian Liu · Vadim Borisov
East Meeting Room 11, 12
Sat 14 Dec, 8:30 a.m. PST
Tables are a promising modality for representation learning and generative models with too much application potential to ignore. However, tables have long been overlooked despite their dominant presence in the data landscape, e.g. data management, data analysis, and ML pipelines. The majority of datasets in Google Dataset Search, for example, resembles typical tabular file formats like CSVs. Similarly, the top-3 most-used database management systems are all intended for relational data. Representation learning for tables, possibly combined with other modalities such as code and text, has shown impressive performance for tasks like semantic parsing, question answering, table understanding, data preparation, and data analysis (e.g. text-to-sql). The pre-training paradigm was shown to be effective for tabular ML (classification/regression) as well. More recently, we also observe promising potential in applying and enhancing generative models (e.g. LLMs) in the domain of structured data to improve how we process and derive insights from structured data.
The Table Representation Learning workshop has been the key venue driving this research vision and establishing a community around TRL. The goal of the third edition of TRL at NeurIPS 2024 is to:
1) showcase the latest impactful TRL research, with a particular focus on industry insights this year,
2) explore new applications, techniques and open challenges for representation learning and generative models for tabular data,
3) facilitate discussion and collaboration across the ML, NLP, and DB communities.
Schedule
Sat 8:30 a.m. - 8:40 a.m.
|
Opening notes
(
Opening/closing
)
>
SlidesLive Video |
Madelon Hulsebos 🔗 |
Sat 8:40 a.m. - 9:20 a.m.
|
Gaël Varoquaux (Inria, Probabl): Tabular foundation models for analytics: challenges and progress
(
Invited talk
)
>
link
SlidesLive Video |
Gael Varoquaux 🔗 |
Sat 9:20 a.m. - 9:30 a.m.
|
MotherNet: Fast Training and Inference via Hyper-Network Transformers
(
Oral
)
>
link
SlidesLive Video |
Andreas Mueller · Carlo Curino · Raghu Ramakrishnan 🔗 |
Sat 9:30 a.m. - 9:40 a.m.
|
PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning
(
Oral
)
>
link
SlidesLive Video |
Weihua Hu · Yiwen Yuan · Zecheng Zhang · Akihiro Nitta · Kaidi Cao · Vid Kocijan · Jinu Sunil · Jure Leskovec · Matthias Fey 🔗 |
Sat 10:00 a.m. - 10:35 a.m.
|
Yasemin Altun (Google DeepMind): Advancements in Structure-Aware Reasoning for Tabular Data ( Invited talk ) > link | Yasemin Altun 🔗 |
Sat 10:35 a.m. - 10:45 a.m.
|
Large Language Models Engineer Too Many Simple Features for Tabular Data
(
Oral
)
>
link
SlidesLive Video |
Jaris Küken · Lennart Purucker · Frank Hutter 🔗 |
Sat 10:45 a.m. - 10:55 a.m.
|
TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning ( Oral ) > link | Xinyuan Lu · Liangming Pan · Yubo Ma · Preslav Nakov · Min-Yen Kan 🔗 |
Sat 10:45 a.m. - 10:55 a.m.
|
TabDiff: a Unified Diffusion Model for Multi-Modal Tabular Data Generation
(
Oral
)
>
link
SlidesLive Video |
Juntong Shi · Minkai Xu · Harper Hua · Hengrui Zhang · Stefano Ermon · Jure Leskovec 🔗 |
Sat 10:55 a.m. - 11:05 a.m.
|
Expertise-Centric Prompting Framework for Financial Tabular Data Generation using Pre-trained Large Language Models
(
Oral
)
>
link
SlidesLive Video |
Subin Kim · Jungmin Son · Minyoung Jung · Youngjun Kwak 🔗 |
Sat 11:05 a.m. - 11:15 a.m.
|
TabSketchFM: Sketch-based Tabular Representation Learning for Data Discovery over Data Lakes
(
Oral
)
>
link
SlidesLive Video |
Aamod Khatiwada · Harsha Kokel · Ibrahim Abdelaziz · Subhajit Chaudhury · Julian T Dolby · Oktie Hassanzadeh · Zhenhan Huang · Tejaswini Pedapati · Horst Samulowitz · Kavitha Srinivas 🔗 |
Sat 11:15 a.m. - 12:00 p.m.
|
Poster session 1
(
Poster Session
)
>
|
🔗 |
Sat 1:30 p.m. - 2:10 p.m.
|
Matei Zaharia (UC Berkeley/Databricks): Lessons from building natural language query interfaces in Databricks AI/BI
(
Invited talk
)
>
SlidesLive Video |
Matei Zaharia 🔗 |
Sat 2:10 p.m. - 2:20 p.m.
|
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
(
Oral
)
>
link
SlidesLive Video |
Satya Krishna Gorti · Ilan Gofman · Zhaoyan Liu · Jiapeng Wu · Noël Vouitsis · Guangwei Yu · Jesse Cresswell · Rasa Hosseinzadeh 🔗 |
Sat 2:20 p.m. - 2:30 p.m.
|
The Death of Schema Linking? Text-to-SQL in the Age of Well-Reasoned Language Models
(
Oral
)
>
link
SlidesLive Video |
Karime Maamari · Fadhil Abubaker · Daniel Jaroslawicz · Amine Mhedhbi 🔗 |
Sat 2:30 p.m. - 3:15 p.m.
|
Poster session 2
(
Poster Session
)
>
|
🔗 |
Sat 3:30 p.m. - 4:10 p.m.
|
Josh Gardner (Apple):Toward Robust, Reliable, and Generalizable Tabular Data Models
(
Invited Talk
)
>
SlidesLive Video |
Josh Gardner 🔗 |
Sat 4:10 p.m. - 4:50 p.m.
|
Panel TRL in Industry [tbc]
(
Panel
)
>
SlidesLive Video |
Xiao Ling · Shivam Singhal · Douwe Kiela · Maithra Raghu · Binyuan Hui 🔗 |
Sat 4:50 p.m. - 5:00 p.m.
|
Closing notes
(
Opening/closing
)
>
SlidesLive Video |
Qian Liu 🔗 |
-
|
On Short Textual Value Column Representation Using Symbol Level Language Models ( Poster ) > link | Ron Begleiter · Nathan Roll 🔗 |
-
|
Lightweight Correlation-Aware Table Compression ( Poster ) > link | Mihail Stoian · Alexander van Renen · Jan Kobiolka · Ping-Lin Kuo · Josif Grabocka · Andreas Kipf 🔗 |
-
|
AdapTable: Test-Time Adaptation for Tabular Data via Shift-Aware Uncertainty Calibrator and Label Distribution Handler ( Poster ) > link | Changhun Kim · Taewon Kim · Seungyeon Woo · June Yong Yang · Eunho Yang 🔗 |
-
|
RACOON: An LLM-based Framework for Retrieval-Augmented Column Type Annotation with a Knowledge Graph ( Poster ) > link | Lindsey Linxi Wei · Guorui Xiao · Magdalena Balazinska 🔗 |
-
|
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data ( Poster ) > link | David Schnurr · Kai Helli · Noah Hollmann · Samuel Müller · Frank Hutter 🔗 |
-
|
UniTable: Towards a Unified Framework for Table Recognition via Self-Supervised Pretraining ( Poster ) > link | ShengYun Peng · Aishwarya Chakravarthy · Seongmin Lee · Xiaojing Wang · Rajarajeswari Balasubramaniyan · Duen Horng Chau 🔗 |
-
|
DynoClass: A Dynamic Table-Class Detection System Without the Need for Predefined Ontologies ( Poster ) > link | Haonan Wang · Eugene Wu · Kechen Liu · Jiaxiang Liu 🔗 |
-
|
ICE-T: Interactions-aware Cross-column Contrastive Embedding for Heterogeneous Tabular Datasets ( Poster ) > link | Tomas Tokar · Scott Sanner 🔗 |
-
|
Sparsely Connected Layers for Financial Tabular Data ( Poster ) > link | Mohammed Abdulrahman · Yin Wang · Hui Chen 🔗 |
-
|
Automating Enterprise Data Engineering with LLMs ( Poster ) > link | Jan-Micha Bodensohn · Ulf Brackmann · Liane Vogel · Anupam Sanghi · Carsten Binnig 🔗 |
-
|
Improving LLM Group Fairness on Tabular Data via In-Context Learning ( Poster ) > link | Valeriia Cherepanova · Chia-Jung Lee · Nil-Jana Akpinar · Riccardo Fogliato · Martin Bertran · Michael Kearns · James Zou 🔗 |
-
|
Tabular Data Generation using Binary Diffusion ( Poster ) > link | Vitaliy Kinakh · Slava Voloshynovskiy 🔗 |
-
|
AGATa: Attention-Guided Augmentation for Tabular Data in Contrastive Learning ( Poster ) > link | Moonjung Eo · Kyungeun Lee · Min-Kook Suh · Hyeseung Cho · Ye Seul Sim · Woohyung Lim 🔗 |
-
|
Synthetic SQL Column Descriptions and Their Impact on Text-to-SQL Performance ( Poster ) > link | Niklas Wretblad · Oskar Holmström · Erik Larsson · Axel Wiksäter · Hjalmar Öhman · Oscar Söderlund · Ture Pontén · Martin Forsberg · Martin Sörme · Fredrik Heintz 🔗 |
-
|
Enhancing Table Representations for Similar Table Recommendation with Synthetic Data Generation ( Poster ) > link | Dayu Yang · Natawut Monaikul · Amanda Ding · Bozhao Tan · Kishore Mosaliganti · Giridharan Iyengar 🔗 |
-
|
RES-RAG: Residual-aware RAG for Realistic Tabular Data Generation ( Poster ) > link | Liancheng Fang · Aiwei Liu · Hengrui Zhang · Henry Zou · Weizhi Zhang · Philip S Yu 🔗 |
-
|
Tabby: Tabular Adaptation for Language Models ( Poster ) > link | Sonia Cromp · Satya Sai Srinath Namburi · Catherine Cao · Mohammed Alkhudhayri · Samuel Guo · Nicholas Roberts · Frederic Sala 🔗 |
-
|
Recurrent Interpolants for Probabilistic Time Series Prediction ( Poster ) > link | Yu Chen · Marin Biloš · Sarthak Mittal · Wei Deng · Kashif Rasul · Anderson Schneider 🔗 |
-
|
TARGET: Benchmarking Table Retrieval for Generative Tasks ( Poster ) > link | Xingyu Ji · Aditya Parameswaran · Madelon Hulsebos 🔗 |
-
|
Data-Centric Text-to-SQL with Large Language Models ( Poster ) > link | Zachary Huang · Shuo Zhang · Kechen Liu · Eugene Wu 🔗 |
-
|
Relational Deep Learning: Graph Representation Learning on Relational Databases ( Poster ) > link |
12 presentersJoshua Robinson · Rishabh Ranjan · Weihua Hu · Kexin Huang · Jiaqi Han · Alejandro Dobles · Matthias Fey · Jan Eric Lenssen · Yiwen Yuan · Zecheng Zhang · Xinwei He · Jure Leskovec |
-
|
TabFlex: Scaling Tabular Learning to Millions with Linear Attention ( Poster ) > link | Yuchen Zeng · Wonjun Kang · Andreas Mueller 🔗 |
-
|
SynQL: Synthetic Data Generation for In-Domain, Low-Resource Text-to-SQL Parsing ( Poster ) > link | Denver Baumgartner · Tomasz Kornuta 🔗 |
-
|
Augmenting Small-size Tabular Data with Class-Specific Energy-Based Models ( Poster ) > link | Andrei Margeloiu · Xiangjian Jiang · Nikola Simidjievski · Mateja Jamnik 🔗 |
-
|
GAMformer: Exploring In-Context Learning for Generalized Additive Models ( Poster ) > link | Andreas Mueller · Julien Siems · Harsha Nori · David Salinas · Arber Zela · Rich Caruana · Frank Hutter 🔗 |
-
|
Towards Optimizing SQL Generation via LLM Routing ( Poster ) > link | Mohammadhossein Malekpour · Nour Shaheen · Foutse Khomh · Amine Mhedhbi 🔗 |
-
|
SALT: Sales Autocompletion Linked Business Tables Dataset ( Poster ) > link | Tassilo Klein · Clemens Biehl · Margarida Costa · Andre Sres · Jonas Kolk · Johannes Hoffart 🔗 |
-
|
Learnable Numerical Input Normalization for Tabular Representation Learning based on B-splines ( Poster ) > link | Min-Kook Suh · Moonjung Eo · Ye Seul Sim · Woohyung Lim 🔗 |
-
|
PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization ( Poster ) > link | Marco Spinaci · Marek Polewczyk · Johannes Hoffart · Markus Kohler · Sam Thelin · Tassilo Klein 🔗 |
-
|
Multi-Stage QLoRA with Augmented Structured Dialogue Corpora: Efficient and Improved Conversational Healthcare AI ( Poster ) > link | Dasun Wickrama Arachchi Athukoralage · Thushari Atapattu 🔗 |
-
|
Enhancing Biomedical Schema Matching with LLM-based Training Data Generation ( Poster ) > link | Yurong Liu · Aécio Santos · Eduardo Pena · Roque Lopez · Eden Wu · Juliana Freire 🔗 |
-
|
Scalable Representation Learning for Multimodal Tabular Transactions ( Poster ) > link | Natraj Raman · Sumitra Ganesh · Manuela Veloso 🔗 |
-
|
Benchmarking table comprehension in the wild ( Poster ) > link | Yikang Pan · Yi Zhu · Rand Xie · Yizhi Liu 🔗 |
-
|
Relational Data Generation with Graph Neural Networks and Latent Diffusion Models ( Poster ) > link | Valter Hudovernik 🔗 |
-
|
Towards Localization via Data Embedding for TabPFN ( Poster ) > link | Mykhailo Koshil · Thomas Nagler · Matthias Feurer · Katharina Eggensperger 🔗 |
-
|
Unmasking Trees for Tabular Data ( Poster ) > link | Calvin McCarter 🔗 |
-
|
Matchmaker: Self-Improving Compositional LLM Programs for Table Schema Matching ( Poster ) > link | Nabeel Seedat · Mihaela van der Schaar 🔗 |
-
|
Towards Agentic Schema Refinement ( Poster ) > link | Agapi Rissaki · Ilias Fountalis · Nikolaos Vasiloglou · Wolfgang Gatterbauer 🔗 |
-
|
Learning Metadata-Agnostic Representations for Text-to-SQL In-Context Example Selection ( Poster ) > link | Chuhong Mai · Ro-ee Tal · Thahir Mohamed 🔗 |
-
|
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features ( Poster ) > link | Shi Bin Hoo · Samuel Müller · David Salinas · Frank Hutter 🔗 |
-
|
Scaling Generative Tabular Learning for Large Language Models ( Poster ) > link | Yiming Sun · Xumeng Wen · Shun Zheng · Xiaowei Jia · Jiang Bian 🔗 |
-
|
TabDeco: A Comprehensive Contrastive Framework for Decoupled Representations in Tabular Data ( Poster ) > link | Suiyao Chen · Jing Wu · Yunxiao Wang · Cheng Ji · Tianpei Xie · Daniel Cociorva · Michael Sharps · Cecile Levasseur · Hakan Brunzell 🔗 |
-
|
LLM Embeddings Improve Test-time Adaptation to Tabular $Y|X$-Shifts ( Poster ) > link | Yibo Zeng · Jiashuo Liu · Henry Lam · Hongseok Namkoong 🔗 |
-
|
Unlearning Tabular Data Without a "Forget Set'' ( Poster ) > link | Aviraj Newatia · Michael Cooper · Rahul Krishnan 🔗 |
-
|
From One to Zero: RAG-IM Adapts Language Models for Interpretable Zero-Shot Predictions on Clinical Tabular Data ( Poster ) > link | Sazan Mahbub · Caleb Ellington · Sina Alinejad · Kevin Wen · Yingtao Luo · Ben Lengerich · Eric Xing 🔗 |
-
|
Adaptivee: Adaptive Ensemble for Tabular Data ( Poster ) > link | Dawid Płudowski · Katarzyna Woźnica 🔗 |
-
|
Distributionally robust self-supervised learning for tabular data ( Poster ) > link | Shantanu Ghosh · Tiankang Xie · Mikhail Kuznetsov 🔗 |
-
|
Exploration of autoregressive models for in-context learning on tabular data ( Poster ) > link | Stefan Baur · Sohyeong Kim 🔗 |
-
|
TabGraphs: A Benchmark and Strong Baselines for Learning on Graphs with Tabular Node Features ( Poster ) > link | Gleb Bazhenov · Oleg Platonov · Liudmila Prokhorenkova 🔗 |
-
|
Adapting TabPFN for Zero-Inflated Metagenomic Data ( Poster ) > link | Giulia Perciballi · Federica Granese · Ahmad Fall · Farida ZEHRAOUI · Edi Prifti · Jean-Daniel Zucker 🔗 |
-
|
HySem: A context length optimized LLM pipeline for unstructured tabular extraction ( Poster ) > link | Narayanan PP · Anantharaman Palacode Narayana Iyer 🔗 |
-
|
Poster session 2
(
Poster Session
)
>
|
🔗 |