Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Table Representation Learning

Generic Entity Resolution Models

Jiawei Tang · Yifei Zuo · Lei Cao · Samuel Madden

Keywords: [ transformer ] [ GPT-3 ] [ Entity Resolution ] [ Generic Model ]


Abstract:

Entity resolution (ER) -- which decides whether two data records refer to the same real-world object -- is a long-standing data integration problem. The state-of-the-art results on ER are achieved by deep learning based methods, which typically convert each pair of records into a distributed representation, followed by using a binary classifier to decide whether these two records are a match or a non-match.However, these methods are dataset specific; that is, one deep learning based model needs to be trained or fine-tuned for each new dataset, which is not generalizable and thus we call them specific ER models. In this paper, we investigate generic ER models, which use a single model to serve multiple ER datasets over different datasets from various domains. In particular, we study two types of generic ER models: Employs foundation models ( e.g., GPT-3) or trains a generic ER model. Our results show that although GPT-3 can perform ER with zero-shot or few-shot learning, the performance is worse than specific ER models. Our trained generic ER model can achieve comparable performance with specific ER models, but with much less train data and much smaller storage overhead.

Chat is not available.