Abstract:
Pre-trained protein language and/or structural models are often fine-tuned on drug development properties (i.e., developability properties) to accelerate drug discovery initiatives. However, these models generally rely on a single structural conformation and/or a single sequence as a molecular representation. We present a physics-based model whereby structural ensemble representations are fused by a transformer-based architecture and concatenated to a language representation to predict antibody protein properties. AbLEF enables the direct infusion of thermodynamic information into latent space and this enhancesproperty prediction by explicitly infusing dynamic molecular behavior that occurs during experimental measurement. We find that $\textbf{(1)}$ ensembles of structures generated from molecular simulation can further improve antibody property prediction for small datasets,$\textbf{(2)}$ fine-tuned large protein language models can match smaller antibody-specific language models at predicting antibody properties, $\textbf{(3)}$ trained multimodal sequence and structural representations outperform sequence representations alone, $\textbf{(4)}$ pre-trained sequence with structure models are competitive with shallow machine learning (ML) methods in the small data regime, and $\textbf{(5)}$ predicting measured antibody properties remains difficult for limited high fidelity datasets.