Skip to yearly menu bar Skip to main content


Oral
in
Workshop: New Frontiers of AI for Drug Discovery and Development

DrugImprover: Utilizing Reinforcement Learning for Multi-Objective Alignment in Drug Optimization

Xuefeng Liu · Songhao Jiang · Archit Vasan · Alexander Brace · Ozan Gokdemir · Thomas Brettin · Fangfang Xia · Ian Foster · Rick Stevens

Keywords: [ Drug optimization ] [ Reinforcement Learning ] [ AI alignment ]


Abstract:

Reinforcement learning from human feedback (RLHF) is a method for enhancing the finetuning of large language models (LLMs), leading to notable performance improvements that can also align better with human values. Building upon the inspiration drawn from RLHF, this research delves into the realm of drug optimization. We employ reinforcement learning to finetune a drug optimization model, enhancing the original drug across multiple target objectives, while retains the beneficial chemical properties of the original drug. Our proposal comprises three primary components: (1) DrugImprover: A framework tailored for improving robustness and efficiency in drug optimization. (2) A novel Advantage-alignment Policy Optimization (APO) with multi-critic guided exploration algorithm for finetuning the objective-oriented properties. (3) A dataset of 2 million compounds, each with OEDOCK docking scores on two proteins, 3CLPro (PDBID: 7BQY) and RTCB (PDBID: 4DWQ), from SARS-CoV-2 and human cancer cells, respectively. We conduct a comprehensive evaluation of APO and demonstrate its effectiveness in improving the original drug across multiple properties.

Chat is not available.