Poster
in
Workshop: Intrinsically Motivated Open-ended Learning (IMOL)
A Multi-agent Reinforcement Learning Study of Evolution of Communication and Teaching under Libertarian and Utilitarian Governing Systems
Aslan Satary Dizaji
Keywords: [ AI-Economist ] [ Governing Systems ] [ Behaviour Simulation ] [ Communication/Teaching ] [ Multi-agent Reinforcement Learning ]
Laboratory experiments have shown that communication plays an important role in solving social dilemmas. Here, by extending the AI-Economist, a mixed motive multi-agent reinforcement learning environment, we intend to find an answer to the following descriptive question: which governing system does facilitate the emergence and evolution of communication and teaching among agents? To answer this question, the AI-Economist is extended by a voting mechanism to simulate three different governing systems across individualistic-collectivistic axis, from Full-Libertarian to Full-Utilitarian governing systems. In the original framework of the AI-Economist, agents are able to build houses individually by collecting material resources from their environment. Here, the AI-Economist is further extended to include communication with possible misalignment - a variant of signalling game - by letting agents to build houses together if they are able to name mutually complement material resources by the same letter. Moreover, another extension is made to the AI-Economist to include teaching with possible misalignment - again a variant of signalling game - by letting half the agents as teachers who know how to use mutually complement material resources to build houses but are not capable of building actual houses, and the other half as students who do not have this information but are able to actually build those houses if teachers teach them. The result shows that collectivistic environment such as Full-Utilitarian system is more favourable for the emergence of communication and teaching, or more precisely, evolution of language alignment. Finally, a discussion is provided to justify this result in the simulation environment of this paper.