Poster
in
Workshop: Generative AI for Education (GAIED): Advances, Opportunities, and Challenges
Paper 44: Evaluating ChatGPT-generated Textbook Questions using IRT
Shreya Bhandari · Yunting Liu · Zachary Pardos
Keywords: [ Psychometric ] [ Generative AI ] [ Measurement ] [ ChatGPT ] [ education ] [ Large language models ] [ question generation ] [ Linking ] [ IRT ] [ Algebra ]
We aim to test the ability of ChatGPT to generate educational assessment questions, given solely a summarization of textbook content. We take a psychometric measurement methodological approach to comparing the qualities of questions, or items, generated by ChatGPT versus gold standard questions from a published textbook. We use Item Response Theory (IRT) to analyze data from 207 test respondents answer questions from OpenStax College Algebra. Using a common item linking design, we find that ChatGPT items fared as well or better than textbook items, showing a better ability to distinguish within the moderate ability group and had higher discriminating power as compared to OpenStax items (1.92 discrimination for ChatGPT vs 1.54 discrimination for OpenStax).