Poster
in
Workshop: AI for Accelerated Materials Design (AI4Mat-2023)
Impacts of Data and Models on Unsupervised Pre-training for Molecular Property Prediction
Elizabeth Coda · Gihan Panapitiya · Emily Saldanha
Keywords: [ Unsupervised pre-training ] [ Molecular Property Prediction ] [ molecular property prediction ]
The available labeled data to support molecular property prediction are limited in size due to experimental time and cost requirements. However, unsupervised learning techniques can leverage vast databases of molecular structures, thus significantly expanding the scope of training data. We compare the effectiveness of pre-training data and modeling choices to support the downstream task of molecular aqueous solubility prediction. We also compare the global and local structure of the learned latent spaces to probe the properties of effective pre-training approaches. We find that the pre-training modeling choices affect predictive performance and the latent space structure much more than the data choices.