Poster
in
Workshop: Adaptive Experimental Design and Active Learning in the Real World
Cross-Entropy Estimators for Sequential Experiment Design with Reinforcement Learning
Tom Blau · Iadine Chades · Amir Dezfouli · Daniel Steinberg · Edwin Bonilla
Reinforcement learning can learn amortised design policies for designing sequences of experiments. However, current methods rely on contrastive estimators of expected information gain, which require an exponential number of contrastive samples to achieve an unbiased estimation. We propose the use of an alternative lower bound estimator, based on the cross-entropy of the joint model distribution and a flexible proposal distribution. This proposal distribution approximates the true posterior of the model parameters given the experimental history and the design policy. Our method requires no contrastive samples, can achieve more accurate estimates of high information gains, allows learning of superior design policies, and is compatible with implicit probabilistic models. We assess our algorithm's performance in various tasks, including continuous and discrete designs and explicit and implicit likelihoods.