Poster
in
Workshop: New Frontiers of AI for Drug Discovery and Development
Synthon Embeddings for Modeling DNA Encoded Libraries
Benson Chen · Mohammad Sultan · Theofanis Karaletsos
Keywords: [ DNA-encoded library ] [ Molecule Representation Learning ]
DNA-Encoded Library (DEL) has proven to be a powerful tool that utilizes combinatorially constructed small-molecules to facilitate highly-efficient screening assays. These selection experiments, involving multiple stages of washing, elution, and identification of potent binders via unique DNA barcodes, often generate complex data. This complexity can potentially mask the underlying signals, necessitating the application of computational tools such as machine learning to uncover valuable insights. We introduce an innovative approach to model DEL data, by decomposing the molecular representation into their mono-synthon and di-synthon building blocks, which capitalizes on the inherent hierarchical structure of these molecules. Additionally, we investigate various methods of integrating covariate factors to more effectively account for data noise. Our model demonstrates strong performance compared to count baselines, enriches the correct pharmacophores, and offers valuable insights via its intrinsic interpretable structure, thereby providing a robust tool for the analysis of DEL data.