Skip to yearly menu bar Skip to main content


Poster

Near-Optimal Reinforcement Learning in Dynamic Treatment Regimes

Junzhe Zhang · Elias Bareinboim

East Exhibition Hall B, C #136

Keywords: [ Probabilistic Methods ] [ Causal Inference ] [ Probabilistic Methods -> Graphical Models; Reinforcement Learning and Planning ] [ Reinforcement Learning ]


Abstract:

A dynamic treatment regime (DTR) consists of a sequence of decision rules, one per stage of intervention, that dictates how to determine the treatment assignment to patients based on evolving treatments and covariates' history. These regimes are particularly effective for managing chronic disorders and is arguably one of the key aspects towards more personalized decision-making. In this paper, we investigate the online reinforcement learning (RL) problem for selecting optimal DTRs provided that observational data is available. We develop the first adaptive algorithm that achieves near-optimal regret in DTRs in online settings, without any access to historical data. We further derive informative bounds on the system dynamics of the underlying DTR from confounded, observational data. Finally, we combine these results and develop a novel RL algorithm that efficiently learns the optimal DTR while leveraging the abundant, yet imperfect confounded observations.

Live content is unavailable. Log in and register to view live content