Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Foundation Models for Science: Progress, Opportunities, and Challenges

Molphenix: A Multimodal Foundation Model for PhenoMolecular Retrieval

Philip Fradkin · Puria Azadi Moghadam · Karush Suri · Frederik Wenkel · Maciej Sypetkowski · Dominique Beaini

Keywords: [ Cell-Painting ] [ Multi-Modality ] [ Molecular Retrieval ] [ Cell Morphology ] [ Molecules ] [ Contrastive Learning ] [ CLIP ] [ Zero-Shot Learning ]

[ ] [ Project Page ]
 
presentation: Foundation Models for Science: Progress, Opportunities, and Challenges
Sun 15 Dec 8:30 a.m. PST — 5 p.m. PST

Abstract: Predicting molecular impact on cellular function is a core challenge in therapeutic design. Phenomic experiments, designed to capture cellular morphology, utilize microscopy based techniques and demonstrate a high throughput solution for uncovering molecular impact on the cell. In this work, we learn a joint latent space between molecular structures and microscopy phenomic experiments, aligning paired samples with contrastive learning. Specifically, we study the problem of Contrastive PhenoMolecular Retrieval, which consists of zero-shot molecular structure identification conditioned on phenomic experiments. We assess challenges in multi-modal learning of phenomics and molecular modalities such as experimental batch effect, inactive molecule perturbations, and encoding perturbation concentration. We demonstrate improved multi-modal learner retrieval through (1) a uni-modal pre-trained phenomics model, (2) a novel inter sample similarity aware loss, and (3) models conditioned on a representation of molecular concentration. Following this recipe, we propose MolPhenix, a molecular phenomics model. MolPhenix leverages a pre-trained phenomics model to demonstrate significant performance gains across perturbation concentrations, molecular scaffolds, and activity thresholds. In particular, we demonstrate an 8.1$\times$ improvement in zero shot molecular retrieval of active molecules over the previous state-of-the-art, reaching 77.33% in top-1% accuracy. These results open the door for machine learning to be applied in virtual phenomics screening, which can significantly benefit drug discovery applications.

Chat is not available.