Poster
Reproducibility Study of "Explaining RL Decisions with Trajectories"
Bart Aaldering · Clio Feng · Colin Bot · Bart Boef
This paper reports on the reproducibility study on the paper `Explaining RL Decisions with Trajectories' by Deshmukh et al. (2023). The authors proposed a method to elucidate the decisions of an offline RL agent by attributing them to clusters of trajectories encountered during training. The original paper explored various environments and conducted a human study to gauge real-world performance. Our objective is to validate the effectiveness of their proposed approach. This paper conducted quantitative and qualitative experiments across three environments: a Grid-world, an Atari video game (Seaquest), and a continuous control task from MuJoCo (HalfCheetah). While the authors provided the code for the Grid-world environment, we re-implemented it for the Seaquest and HalfCheetah environments. This work extends the original paper by including trajectory rankings within a cluster, experimenting with alternative trajectory clustering, and expanding the human study. The results affirm the effectiveness of the method, both in its reproduction and in the additional experiments. However, the results of the human study suggest that the method's explanations are more challenging to interpret for humans in more complex environments. Our implementations can be found on GitHub.
Live content is unavailable. Log in and register to view live content