Poster
Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning
Beyazit Yalcinkaya · Niklas Lauffer · Marcell Vazquez-Chanlatte · Sanjit Seshia
Goal-conditioned reinforcement learning is a powerful way to control an AI agent's behavior at runtime. That said, popular goal representations, e.g., target states or natural language, are either limited to Markovian tasks or rely on ambiguous task semantics. We propose representing temporal goals using compositions of deterministic finite automata (cDFA). cDFAs balance the need for formal temporal semantics with ease of interpretation--if one can understand a flow chart, one can understand a cDFA. On the other hand, cDFA are a countably infinite concept class with Boolean semantics, and subtle changes to the automata can result in very different agent behavior. To address this, we observe that all paths through a DFA correspond to a series of reach-avoid tasks. Based on this, we propose pre-training graph neural network embedding on "reach-avoid derived" DFAs. Empirically, we demonstrate that the proposed pre-training method enables zero-shot generalization to various cDFA task classes and accelerated policy specialization.
Live content is unavailable. Log in and register to view live content