NeurIPS Neural Interactive Proofs

Poster
in
Workshop: Towards Safe & Trustworthy Agents

Neural Interactive Proofs

Lewis Hammond · Sam Adam-Day

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

We consider the problem of how a trusted, but computationally bounded agent (a 'verifier') can learn to interact with one or more powerful but untrusted agents ('provers') in order to solve a given task. More specifically, we study the case in which agents are represented using neural networks and refer to solutions of this problem as neural interactive proofs. First we introduce a unifying framework based on prover-verifier games Anil et al. (2021), which generalises previously proposed interaction protocols. We then describe several new protocols for generating neural interactive proofs, and provide a (theoretical) comparison of both new and existing approaches. In so doing, we aim to create a foundation for future work on neural interactive proofs and their application in building safer AI systems.

Chat is not available.

Poster in Workshop: Towards Safe & Trustworthy Agents

Neural Interactive Proofs

Lewis Hammond · Sam Adam-Day

Poster
in
Workshop: Towards Safe & Trustworthy Agents