Poster
in
Workshop: Machine Learning in Structural Biology
SPECTRE: A Spectral Transformer for Molecule Identification
Wangdong Xu
Nuclear Magnetic Resonance (NMR) spectroscopy is essential for identifying novel natural products. However, interpreting NMR spectra is time-consuming and requires expertise, leading to the development of computational tools for "structure annotation", which provides an ordered list of similar known molecules to speed up identification.This work introduces SPECTRE, a state-of-the-art transformer-based model for structure annotation. Key contributions include 1) A novel, entropy-optimized Morgan fingerprint (MF) that can be adjusted for different NMR spectra types. 2) A lightweight, accurate structure annotation method, accepting flexible types of NMR input by Data type dropout (DTD), a new data augmentation technique to handle missing modalities for multi-modal models. As a result, SPECTRE achieves 95.79% accuracy, a 12.18% improvement over the previous SOTA.Our code is available at https://anonymous.4open.science/r/SPECTRE-C3DE/ and the dataset is available at https://shorturl.at/bCWMb. Unfortunately, we have to remove all the HSQC spectra from the dataset because of intellectual property issue.