Poster
in
Workshop: Optimal Transport and Machine Learning
Applications of Optimal Transport Distances in Unsupervised AutoML
prabhant singh · Joaquin Vanschoren
In this work, we explore the utility of Optimal Transport-based dataset similarity to find similar \textit{unlabeled tabular} datasets, especially in the context of automated machine learning (AutoML) on unsupervised tasks. Since unsupervised tasks don't have a ground truth that optimization techniques can optimize towards, but often do have historical information on which pipelines work best, we propose to meta-learn over prior tasks to transfer useful pipelines to new tasks. Our intuition behind this work is that pipelines that worked well on datasets with a \textit{similar underlying data distribution} will work well on new datasets. We use Optimal Transport distances to find this similarity between unlabeled tabular datasets and recommend machine learning pipelines on two downstream unsupervised tasks: Outlier Detection and Clustering. We obtain very promising results against existing baselines and state-of-the-art methods.