Skip to yearly menu bar Skip to main content


Poster
in
Workshop: AI4Mat-2024: NeurIPS 2024 Workshop on AI for Accelerated Materials Design

Spectro: A multi-modal approach for molecule elucidation using IR and NMR data

Edwin Chacko · Rudra Sondhi · Arnav Praveen · Kylie Luska · Rodrigo Vargas-Hernandez

Keywords: [ IR ] [ molecular elucidation ] [ vision models ] [ NMR ] [ molecules ] [ LLM ] [ SELFIES ]


Abstract: Molecular structure elucidation is a crucial but fundamentally challenging step in the characterization of materials given the large number of possible structures. Here, we introduce Spectro, an innovative multi-modal approach for molecular elucidation that combines $^{13}\ce{C}$ and $^{1}\ce{H}$ NMR data with IR. Spectro translates the embedded representations of the spectra into molecular structures using the SELFIES notation. We employed a vision model for the embedded representation of the IR data, which was pretrained to detect relevant functional group peaks in the IR spectra achieving an F1 score of 91\%. For NMR data, we utilized LLM2Vec, treating the NMR spectra as text. This integration of multiple spectroscopic techniques allows Spectro to achieve an overall test accuracy of 93\% when trained jointly with the vision model for the IR spectra, and 82\% when trained with fixed embeddings. Our approach demonstrates the potential of multi-modal learning in tackling complex molecular characterization tasks.

Chat is not available.