Poster
in
Workshop: AI meets Moral Philosophy and Moral Psychology: An Interdisciplinary Dialogue about Computational Ethics
#15: A reinforcement-learning meta-control architecture based on the dual-process theory of moral decision-making
Maximilian Maier · Vanessa Cheung · Falk Lieder
Keywords: [ alignment ] [ Reinforcement Learning ] [ moral psychology ] [ cognitive modelling ]
Deep neural networks are increasingly tasked with making complex, real-world decisions that can have morally significant consequences. But it is difficult to predict when a deep neural network will go wrong, and wrong decisions can cause significantly negative outcomes.In contrast, human moral decision-making is often remarkably robust. This is partly achieved by relying on both moral rules and cost-benefit reasoning. In this paper, we reverse-engineer people's capacity for robust moral decision-making as a cognitively inspired reinforcement-learning (RL) architecture that learns how much weight to give to following rules vs. cost-benefit reasoning. We confirm the predictions of our model in a large online experiment on human moral learning. We find that our RL architecture can capture how people learn to make moral decisions, suggesting that it could be applied to make AI decision-making safer and more robustly beneficial to society.