Skip to yearly menu bar Skip to main content


Oral
in
Workshop: Table Representation Learning Workshop (TRL)

TabDiff: a Unified Diffusion Model for Multi-Modal Tabular Data Generation

Juntong Shi · Minkai Xu · Harper Hua · Hengrui Zhang · Stefano Ermon · Jure Leskovec

Keywords: [ Generative Models ] [ Tabular Representative Learning ] [ Diffusion Models ]

[ ] [ Project Page ]
Sat 14 Dec 10:45 a.m. PST — 10:55 a.m. PST

Abstract:

Synthesizing high-quality tabular data is an important topic in many data science applications, ranging from dataset augmentation to privacy protection. However, developing expressive generative models for tabular data is challenging due to its inherent heterogeneous data types and intricate column-wise distributions. In this paper, we introduce TabDiff, a unified diffusion framework that models all multi-modal distributions of mixed-type tabular data in one model. Our key insight is to design different continuous-time diffusion processes for numerical and categorical data, and learn one model to simultaneously predict the noise for different modalities. To counter the high disparity of different feature distributions, we further introduce feature-wise learnable diffusion processes to optimally balance the generative performance. The entire framework can be efficiently optimized in an end-to-end fashion. Comprehensive experiments on seven datasets demonstrate that TabDiff achieves superior average performance over existing competitive baselines across five out of six metrics.

Chat is not available.