Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Pluralistic Alignment Workshop

Trustworthy Human-AI Interaction Through Agreement Protocols

Natalie Collina · Surbhi Goel · Varun Gupta · Aaron Roth


Abstract:

We give an efficient reduction through which any machine learning algorithm can be converted into an interactive protocol that can interact with a human decision maker to improve their predictions. In the interactive protocol, the machine learning model first produces a prediction, which implies a recommended action for the human decision maker. The human can update their beliefs as a function of the recommendation they received, but can also incorporate their own knowledge and observations, and so may disagree with the provided recommendation. The human then responds to the model's recommendation by conveying either agreement with the model's recommendation, or directional disagreement. The model then updates its state and provides a new recommendation, and the human may in turn again update their beliefs given the new information they have learned from the model. The process continues until the model and the human reach agreement. We show that any predictive model can be efficiently turned into an agreement protocol (without reducing its accuracy) so that the number of rounds until the protocol reaches agreement is small, under forgiving, computationally tractable assumptions on the human's decision making process. Additionally, the final decisions are guaranteed to be more accurate than either those of the model or of the human on their own. In fact, the assumptions we make on the human would be satisfied by a Bayesian decision maker, but are a substantial relaxation that makes them algorithmically tractable and do not require perfect Bayesian rationality.

Chat is not available.