Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Interpretable AI: Past, Present and Future

Policy-shaped prediction: improving world modeling through interpretability

Miles Hutson · Isaac Kauvar · Nick Haber


Abstract:

Model-based reinforcement learning (MBRL) offers sample-efficient policy optimization but is susceptible to distractions. We address this by developing Policy-Shaped Prediction (PSP), a method that empowers agents to interpret their own policies and shape their world models accordingly. By combining gradient-based interpretability, pretrained segmentation models, and adversarial learning, PSP outperforms existing distractor-reduction approaches. This work represents an interpretability-driven advance towards robust MBRL.

Chat is not available.