NeurIPS Multi-Agent Vulnerabilities in Superhuman AI

Keynote Talk
in
Workshop: Multi-Agent Security: Security as Key to AI Safety

Multi-Agent Vulnerabilities in Superhuman AI

Adam Gleave

[ Abstract ]

Abstract:

Game-playing systems were among the first AI systems to reach superhuman performance, beating professionals in competitive games like chess and Go. If AIs are robust in any setting, we would expect it to be in such zero-sum games, where performance is almost synonymous with lack of exploitability. However, we recently found that a variety of superhuman Go AIs are vulnerable to a simple adversarial strategy. In this talk, we will outline a threat model for multi-agent adversarial attacks, discuss prior vulnerabilities discovered under this threat model, before diving into vulnerabilities in Go AIs. We will conclude by discussing possible mitigations to improve robustness.

Chat is not available.

Keynote Talk in Workshop: Multi-Agent Security: Security as Key to AI Safety

Multi-Agent Vulnerabilities in Superhuman AI

Adam Gleave

Keynote Talk
in
Workshop: Multi-Agent Security: Security as Key to AI Safety