Poster
in
Workshop: Table Representation Learning Workshop
Tree-Regularized Tabular Embeddings
Xuan Li · Yun Wang · Bo Li
Keywords: [ Representation Learning ] [ Supervised Pretraining ] [ tabular ] [ Deep Neural Networks ] [ Regularization ]
Tabular neural network (NN) has attracted remarkable attentions and its recent advances have gradually narrowed the performance gap with respect to tree-based models on many public datasets. While the mainstream focus on calibrating NN to fit tabular data, we emphasize the importance of homogeneous embeddings and alternately concentrate on regularizing tabular inputs through supervised pretraining. Specifically, we extend a recent work named DeepTLF, and utilize the structure of pretrained tree ensembles to transform raw variables into a single vector (T2V), or an array of tokens (T2T). Without loss of space efficiency, these binarized embeddings can be directly consumed by canonical tabular NN with full-connected or attention-based building blocks. Through quantitative experiments on 88 OpenML datasets with binary classification task, we validated that the proposed tree-regularized representations not only taper the difference with respect to tree-based models, but also achieve on-par and better performance when compared with advanced NN models. Most importantly, it possesses better robustness and can be easily scaled and generalized as standalone encoder for tabular modality.