Poster
in
Workshop: AI for New Drug Modalities
PepDoRA: A Unified Peptide Language Model via Weight-Decomposed Low-Rank Adaptation
Leyao Wang · Rishab Pulugurta · Pranay Vure · Aastha Pal · Yinuo Zhang · Pranam Chatterjee
Peptide therapeutics, including macrocycles, peptide inhibitors, and bioactive linear peptides, play a crucial role in therapeutic development due to their unique chemical properties. However, predicting their functional properties remains challenging. While structure-based models focus primarily on local interactions, language models are capable of capturing global therapeutic properties of both modified and linear peptides. Still, protein language models like ESM-2, though effective for natural peptides, cannot encode modifications. Conversely, pre-trained chemical language models, such as ChemBERTa or ChemFormer, excel in representing small molecules but are not optimized for peptides. To bridge this gap, we introduce PepDoRA, a unified peptide representation model. Leveraging Weight-Decomposed Low-Rank Adaptation (DoRA), PepDoRA efficiently fine-tunes the ChemBERTa-77M-MLM on a masked language model (MLM) objective to generate optimized embeddings for downstream tasks involving both modified and unmodified peptides. Specifically, we show that PepDoRA embeddings capture functional properties of input peptides, enabling the accurate prediction of membrane permeability. We further show that PepDoRA, when integrated into a contrastive language model with ESM-2 target protein embeddings, effectively learns target protein-specific binding properties. Overall, by providing a unified representation for diverse peptides, PepDoRA serves as a versatile tool for function and activity prediction, facilitating the development of peptide therapeutics across a broad spectrum of applications.