Skip to yearly menu bar Skip to main content


Poster

WelQrate: Defining the Gold Standard in Small Molecule Drug Discovery Benchmarking

Yunchao Liu · Ha Dong · Xin Wang · Rocco Moretti · Yu Wang · Zhaoqian Su · Jiawei Gu · Bobby Bodenheimer · Charles Weaver · Jens Meiler · Tyler Derr

[ ] [ Project Page ]
Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

While deep learning has revolutionized computer-aided drug discovery, the AI community has predominantly focused on model innovation and placed less emphasis on establishing best benchmarking practices. We posit that without a sound model evaluation framework, the AI community's efforts cannot reach their full potential, thereby slowing the progress and transfer of innovation into real-world drug discovery.Thus, in this paper, we seek to establish a new gold standard for small molecule drug discovery benchmarking, WelQrate. Specifically, our contributions are threefold: Data Curation Pipeline - we introduce a meticulously curated collection of 9 datasets spanning 5 therapeutic target classes. Our hierarchical curation pipelines, designed by drug discovery experts, go beyond the primary high-throughput screen by leveraging additional confirmatory and counter screens along with rigorous domain-driven preprocessing, such as Pan-Assay Interference Compounds (PAINS) filtering, to ensure the high-quality labeling of active molecules; Evaluation Framework - we propose a standardized model evaluation framework considering featurization, 3D structure generation, and evaluation metrics, which provides a reliable benchmarking for drug discovery experts conducting real-world virtual screening; Benchmarking - we benchmark existing representative deep learning architectures (e.g., 2D/3D graph neural networks) on WelQrate, while also empirically highlighting the importance of the high-quality activity labeling performed in our data curation pipeline. In summary, we recommend adopting our proposed WelQrate as the gold standard in small molecule drug discovery benchmarking. The WelQrate, including datasets, codes, and experimental scripts are all publicly available at WelQrate.org.

Live content is unavailable. Log in and register to view live content