Poster
in
Workshop: Table Representation Learning Workshop
Introducing the Observatory Library for End-to-End Table Embedding Inference
Tianji Cong · Zhenjie Sun · Paul Groth · H. V. Jagadish · Madelon Hulsebos
Keywords: [ Table Representation Learning ] [ tabular language models ] [ End-to-End Table Embedding Inference ]
Transformer-based tabular language models have become prevalent for a wide range of applications involving tabular data. Such models require the serialization of a table as a sequence of tokens for model ingestion and embedding inference. Different downstream tasks require different kinds or levels of embeddings such as column or entity embeddings. Hence, various serialization and encoding methods have been proposed and implemented. Surprisingly, this conceptually simple process of creating table embeddings is not straightforward in practice for a few reasons: 1) a model may not natively expose a certain level of embedding; 2) choosing the correct table serialization and input preprocessing methods is difficult because there are many available; and 3) tables with a massive number of rows and columns cannot fit the input limit of models. In this work, we extend Observatory, a framework for characterizing embeddings of relational tables, by streamlining end-to-end inference of table embeddings, which eases the use of tabular language models in practice. The codebase of Observatory is publicly available at https://github.com/superctj/observatory.