Workshop
Table Representation Learning
Madelon Hulsebos · Bojan Karlaš · Pengcheng Yin · haoyu dong
Room 398
Fri 2 Dec, 6:30 a.m. PST
We develop large models to “understand” images, videos and natural language that fuel many intelligent applications from text completion to self-driving cars. But tabular data has long been overlooked despite its dominant presence in data-intensive systems. By learning latent representations from (semi-)structured tabular data, pretrained table models have shown preliminary but impressive performance for semantic parsing, question answering, table understanding, and data preparation. Considering that such tasks share fundamental properties inherent to tables, representation learning for tabular data is an important direction to explore further. These works also surfaced many open challenges such as finding effective data encodings, pretraining objectives and downstream tasks.
Key questions that we aim to address in this workshop are:
- How should tabular data be encoded to make learned Table Models generalize across tasks?
- Which pre-training objectives, architectures, fine-tuning and prompting strategies, work for tabular data?
- How should the varying formats, data types, and sizes of tables be handled?
- To what extend can Language Models be adapted towards tabular data tasks and what are their limits?
- What tasks can existing Table Models accomplish well and what opportunities lie ahead?
- How do existing Table Models perform, what do they learn, where and how do they fall short?
- When and how should Table Models be updated in contexts where the underlying data source continuously evolves?
The First Table Representation Learning workshop is the first workshop in this emerging research area and is centered around three main goals:
1) Motivate tabular data as primal modality for representation learning and further shaping this area.
2) Showcase impactful applications of pretrained table models and discussing future opportunities thereof.
3) Foster discussion and collaboration across the machine learning, natural language processing, and data management communities.
Speakers
Alon Halevy (keynote), Meta AI
Graham Neubig (keynote), Carnegie Mellon University
Carsten Binnig, TU Darmstadt
Çağatay Demiralp, Sigma Computing
Huan Sun, Ohio State University
Xinyun Chen, Google Brain
Panelists
TBA
Scope
We invite submissions that address, but are not limited to, any of the following topics on machine learning for tabular data:
Representation Learning Representation learning techniques for structured (e.g., relational databases) or semi-structured (Web tables, spreadsheet tables) tabular data and interfaces to it. This includes developing specialized data encodings or adaptation of general-purpose ones (e.g., GPT-3) for tabular data, multimodal learning across tables, and other modalities (e.g., natural language, images, code), and relevant fine-tuning and prompting strategies.
Downstream Applications Machine learning applications involving tabular data, such as data preparation (e.g. data cleaning, integration, cataloging, anomaly detection), retrieval (e.g., semantic parsing, question answering, fact-checking), information extraction, and generation (e.g., table-to-text).
Upstream Applications Applications that use representation learning to optimize tabular data processing systems, such as table parsers (extracting tables from documents, spreadsheets, presentations, images), storage (e.g. compression, indexing), and querying (e.g. query plan optimization, cost estimation).
Industry Papers Applications of tabular representation models in production. Challenges of maintaining and managing table representation models in a fast evolving context, e.g. data updating, error correction, monitoring.
New Resources Survey papers, analyses, benchmarks and datasets for tabular representation models and their applications, visions and reflections to structure and guide future research.
Important dates
Submission open: 20 August 2022
Submission deadline: 26 September 2022
Notifications: 20 October 2022
Camera-ready, slides and recording upload: 3 November 2022
Workshop: 2 December 2022
Submission formats
Abstract: 1 page + references.
Extended abstract: at most 4 pages + references.
Regular paper: at least 6 pages + references.
Questions:
table-representation-learning-workshop@googlegroups.com (public)
m.hulsebos@uva.nl (private)
Schedule
Fri 6:30 a.m. - 6:45 a.m.
|
Opening Remarks
(
Notes
)
>
SlidesLive Video |
🔗 |
Fri 6:45 a.m. - 7:30 a.m.
|
Alon Halevy - "Structured Data Inside and Out"
(
Keynote
)
>
SlidesLive Video |
Alon Halevy 🔗 |
Fri 7:30 a.m. - 7:45 a.m.
|
Analysis of the Attention in Tabular Language Models
(
Talk
)
>
link
SlidesLive Video |
Aneta Koleva · Martin Ringsquandl · Volker Tresp 🔗 |
Fri 7:45 a.m. - 8:15 a.m.
|
Huan Sun - "Self-supervised Pre-training on Tables"
(
Talk
)
>
SlidesLive Video |
Huan Sun 🔗 |
Fri 8:15 a.m. - 8:30 a.m.
|
Coffee/Tea Break
|
🔗 |
Fri 8:30 a.m. - 9:15 a.m.
|
Poster Session 1
(
Poster Session
)
>
|
🔗 |
Fri 9:15 a.m. - 9:45 a.m.
|
Carsten Binnig - Pre-trained Models for Learned DBMS Components
(
Talk
)
>
SlidesLive Video |
Carsten Binnig 🔗 |
Fri 9:45 a.m. - 10:00 a.m.
|
STable: Table Generation Framework for Encoder-Decoder Models
(
Talk
)
>
link
SlidesLive Video |
Michał Pietruszka · Michał Turski · Łukasz Borchmann · Tomasz Dwojak · Gabriela Pałka · Karolina Szyndler · Dawid Jurkiewicz · Łukasz Garncarek 🔗 |
Fri 10:00 a.m. - 10:15 a.m.
|
Transfer Learning with Deep Tabular Models
(
Talk
)
>
link
SlidesLive Video |
Roman Levin · Valeriia Cherepanova · Avi Schwarzschild · Arpit Bansal · C. Bayan Bruss · Tom Goldstein · Andrew Wilson · Micah Goldblum 🔗 |
Fri 10:15 a.m. - 11:30 a.m.
|
Lunch Break
|
🔗 |
Fri 11:30 a.m. - 12:15 p.m.
|
Graham Neubig - "Unsupervised Methods for Table and Schema Understanding"
(
Keynote
)
>
SlidesLive Video |
Graham Neubig 🔗 |
Fri 12:15 p.m. - 12:30 p.m.
|
Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning
(
Talk
)
>
link
SlidesLive Video |
David Vos · Till Döhmen · Sebastian Schelter 🔗 |
Fri 12:30 p.m. - 12:45 p.m.
|
Byung-Hak - "RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild"
(
Talk
)
>
SlidesLive Video |
🔗 |
Fri 12:45 p.m. - 1:00 p.m.
|
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
(
Talk
)
>
link
SlidesLive Video |
Noah Hollmann · Samuel Müller · Katharina Eggensperger · Frank Hutter 🔗 |
Fri 1:15 p.m. - 1:30 p.m.
|
Coffee/Tea Break
|
🔗 |
Fri 1:30 p.m. - 2:00 p.m.
|
Poster Session 2
(
Poster Session
)
>
|
🔗 |
Fri 2:00 p.m. - 2:30 p.m.
|
Xinyun Chen - "Program Synthesis from Semi-Structured Context"
(
Talk
)
>
SlidesLive Video |
Xinyun Chen 🔗 |
Fri 2:30 p.m. - 3:30 p.m.
|
Panel [Huan Sun (chair), Frank Hutter, Heng Ji, Julian Eisenschlos, Gaël Varoquaux, Graham Neubig]
(
Panel
)
>
SlidesLive Video |
🔗 |
Fri 3:30 p.m. - 3:45 p.m.
|
Closing Remarks
(
Notes
)
>
SlidesLive Video |
🔗 |
-
|
The Need for Tabular Representation Learning: An Industry Perspective ( Poster ) > link |
13 presentersJoyce Cahoon · Alexandra Savelieva · Andreas Mueller · Avrilia Floratou · Carlo Curino · Hiren Patel · Jordan Henkel · Markus Weimer · Roman Batoukov · Shaleen Deep · Venkatesh Emani · Richard Wydrowski · Nellie Gustafsson |
-
|
SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training ( Poster ) > link | Gowthami Somepalli · Avi Schwarzschild · Micah Goldblum · C. Bayan Bruss · Tom Goldstein 🔗 |
-
|
Generic Entity Resolution Models ( Poster ) > link | Jiawei Tang · Yifei Zuo · Lei Cao · Samuel Madden 🔗 |
-
|
RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training (Short Paper) ( Poster ) > link | Zui Chen · Lei Cao · Samuel Madden 🔗 |
-
|
SiMa: Federating Data Silos using GNNs ( Poster ) > link | Christos Koutras · Rihan Hai · Kyriakos Psarakis · Marios Fragkoulis · Asterios Katsifodimos 🔗 |
-
|
STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables ( Poster ) > link | Jaehyun Nam · Jihoon Tack · Kyungmin Lee · Hankook Lee · Jinwoo Shin 🔗 |
-
|
Analysis of the Attention in Tabular Language Models ( Poster ) > link | Aneta Koleva · Martin Ringsquandl · Volker Tresp 🔗 |
-
|
Towards Foundation Models for Relational Databases [Vision Paper] ( Poster ) > link | Liane Vogel · Benjamin Hilprecht · Carsten Binnig 🔗 |
-
|
Transfer Learning with Deep Tabular Models ( Poster ) > link | Roman Levin · Valeriia Cherepanova · Avi Schwarzschild · Arpit Bansal · C. Bayan Bruss · Tom Goldstein · Andrew Wilson · Micah Goldblum 🔗 |
-
|
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild ( Poster ) > link | Weiyao Wang · Byung-Hak Kim · Varun Ganapathi 🔗 |
-
|
Diffusion models for missing value imputation in tabular data ( Poster ) > link | Shuhan Zheng · Nontawat Charoenphakdee 🔗 |
-
|
STab: Self-supervised Learning for Tabular Data ( Poster ) > link | Ehsan Hajiramezanali · Max Shen · Gabriele Scalia · Nathaniel Diamant 🔗 |
-
|
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second ( Poster ) > link | Noah Hollmann · Samuel Müller · Katharina Eggensperger · Frank Hutter 🔗 |
-
|
MapQA: A Dataset for Question Answering on Choropleth Maps ( Poster ) > link | Shuaichen Chang · David Palzer · Jialin Li · Eric Fosler-Lussier · Ningchuan Xiao 🔗 |
-
|
Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning ( Poster ) > link | David Vos · Till Döhmen · Sebastian Schelter 🔗 |
-
|
CASPR: Customer Activity Sequence based Prediction and Representation ( Poster ) > link | Damian Kowalczyk · Pin-Jung Chen · Sahil Bhatnagar 🔗 |
-
|
MET: Masked Encoding for Tabular Data ( Poster ) > link | Kushal Majmundar · Sachin Goyal · Praneeth Netrapalli · Prateek Jain 🔗 |
-
|
Conditional Contrastive Networks ( Poster ) > link | Emily Mu · John Guttag 🔗 |
-
|
Structural Embedding of Data Files with MAGRITTE ( Poster ) > link | Gerardo Vitagliano · Mazhar Hameed · Felix Naumann 🔗 |
-
|
Active Learning with Table Language Models ( Poster ) > link | Martin Ringsquandl · Aneta Koleva 🔗 |
-
|
Self-supervised Representation Learning Across Sequential and Tabular Features Using Transformers ( Poster ) > link | Rajat Agarwal · Anand Muralidhar · Agniva Som · Hemant Kowshik 🔗 |
-
|
Self Supervised Pre-training for Large Scale Tabular Data ( Poster ) > link | Sharad Chitlangia · Anand Muralidhar · Rajat Agarwal 🔗 |
-
|
STable: Table Generation Framework for Encoder-Decoder Models ( Poster ) > link | Michał Pietruszka · Michał Turski · Łukasz Borchmann · Tomasz Dwojak · Gabriela Pałka · Karolina Szyndler · Dawid Jurkiewicz · Łukasz Garncarek 🔗 |
-
|
Tabular Data Generation: Can We Fool XGBoost ? ( Poster ) > link | EL Hacen Zein · Tanguy Urvoy 🔗 |