Poster
in
Workshop: Generative AI for Education (GAIED): Advances, Opportunities, and Challenges
Paper 43: Large language model augmented exercise retrieval for personalized language learning
Austin Xu · Klinton Bicknell · Will Monroe
Keywords: [ Personalization ] [ online language learning ] [ zero-shot exercise retrieval ] [ Large language models ]
We study the problem of zero-shot multilingual exercise retrieval in the context of online language learning, to give students the ability to explicitly request personalized exercises via natural language. Using real-world data collected from language learners, we observe that vector similarity approaches poorly capture the relationship between exercise content and the language users use to express what they want to learn. This semantic gap between queries and content dramatically reduces the effectiveness of general-purpose retrieval models pretrained on large scale information retrieval datasets. We leverage the generative capabilities of large language models to bridge the gap by synthesizing hypothetical exercises based on the user’s input, which are then used to search for relevant exercises. Our approach, which we call mHyER, outperforms several strong baselines, such as Contriever, on a novel benchmark created from publicly available Tatoeba data.