Keynote Talk
in
Workshop: Multi-Agent Security: Security as Key to AI Safety
Multi-Agent Vulnerabilities in Superhuman AI
Adam Gleave
Abstract:
Game-playing systems were among the first AI systems to reach superhuman performance, beating professionals in competitive games like chess and Go. If AIs are robust in any setting, we would expect it to be in such zero-sum games, where performance is almost synonymous with lack of exploitability. However, we recently found that a variety of superhuman Go AIs are vulnerable to a simple adversarial strategy. In this talk, we will outline a threat model for multi-agent adversarial attacks, discuss prior vulnerabilities discovered under this threat model, before diving into vulnerabilities in Go AIs. We will conclude by discussing possible mitigations to improve robustness.
Chat is not available.