Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design

MolGen-Transformer: An open-source self-supervised model for Molecular Generation and Latent Space Exploration

Chih-Hsuan Yang · Rebekah Duke · Parker Sornberger · Moses Dominic · Chad Risko · Baskar Ganapathysubramanian

Keywords: [ Self-supervised learning ] [ Molecular generation ] [ Molecular diversity ] [ Organic molecule synthesis ] [ Latent space exploration ]


Abstract:

We present the MolGen-Transformer, a generative AI model achieving 100% reconstruction accuracy through self-supervised training using a large, curated meta-dataset of organic molecules with less than 168 atoms. MolGen-Transformer produces valid molecular structures using the SELF-referencing Embedded Strings (SELFIES) representation. Our training dataset comprises 198 million organic molecules, selected to encompass a wide range of organic structures. We illustrate the generative capability of this model in three ways: (a) Generating chemically similar molecules, where the model creates structurally similar valid molecules to a given prompt molecule; (b) Producing Diverse Molecules, where the model creates structurally diverse valid molecules given a random latent seed, and (c) Identifying Chemical Intermediates, where the model creates a sequence of valid molecules connecting two given molecules. MolGen-Transformer allows the generation and exploration of structurally similar molecules and provides insights into structural pathways between molecules. The model weights and inference methods are publicly available to support community use. We also provide an easy-to-use website for exploration.

Chat is not available.