Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Machine Learning Safety

Towards Defining Deception in Structural Causal Games

Francis Ward


Abstract:

Deceptive agents are a challenge for the safety, trustworthiness, and cooperation ofAI systems. We focus on the problem that agents might deceive in order to achievetheir goals. There are a number of existing definitions of deception in the literatureon game theory and symbolic AI, but there is no overarching theory of deceptionfor learning agents in games. We introduce a functional definition of deceptionin structural causal games, grounded in the philosophical literature. We presentseveral examples to establish that our formal definition captures philosophical andcommonsense desiderata for deception.

Chat is not available.