Poster
in
Workshop: Information-Theoretic Principles in Cognitive Systems (InfoCog)
Empowerment, Free Energy Principle and Maximum Occupancy Principle Compared
Ruben Moreno Bote · Jorge Ramirez Ruiz
While the objective of reward maximization in reinforcement learning has lead to impressive achievements in several games and artificial environments, animals seem to be driven by intrinsic signals that are not purely extrinsic, such as curiosity.Several reward-free approaches have emerged in the fields of cognitive neuroscience and artificial intelligence that primarily make use of signals different from extrinsic rewards to guide exploration and ultimately drive behavior, but a comparison between these approaches is lacking. Here we focus on two popular reward-free approaches, known as empowerment (MPOW) and free energy principle (FEP), and a recently developed one, called maximum occupancy principle (MOP), and compare them in sequential problems and fully-observable environments.We find that MPOW shows a preference for unstable fixed points of the dynamical system that defines the agent and environment.FEP is shown to be equivalent to reward maximization in certain cases.None of these two principles of behavior seem to consistently generate variable behavior: behavior collapses within a small repertoire of possible action-state trajectories or fixed points. Collapse to an optimal deterministic policy can be proved in specific, recent implementations of FEP, with the only exception of policy degeneracy due to ties. In contrast, MOP consistently generates variable action-state trajectories. In two simple environments, a balancing cartpole and a grid world, we find that both MPOW and FEP agents stick to a relatively small set of states and actions, while MOP agents generate short of exploratory and dancing-like motions.