Oral
in
Workshop: The First Workshop on Large Foundation Models for Educational Assessment
Automated Feedback Generation for Open-Ended Questions: Insights from Fine-Tuned LLMs
Elisabetta Mazzullo · Okan Bulut
Timely, personalized, and actionable feedback is essential for effective learning but challenging to deliver at scale. Automated feedback generation (AFG) using large language models (LLMs) can be a promising solution to address this challenge. While existing studies using out-of-the-box LLMs and prompting strategies have shown promise, there is room for improvement. This study investigates the fine-tuning of OpenAI’s GPT-3.5-turbo for AFG. We developed feedback for open-ended situational judgment questions, and this small set of hand-crafted feedback examples was used to fine-tune the pre-trained LLM using specific prompting strategies. Our evaluation, conducted by independent judges and test experts, found that the feedback generated by our fine-tuned GPT-3.5-turbo model achieved high user satisfaction (84.8\%) and met key structural quality criteria (72.9\%). Also, the model generalized effectively across different items, providing feedback consistent with instructions, regardless of the respondent's performance level, English proficiency, or student status. However, some feedback statements still contained linguistic errors, lacked focused suggestions, or seemed generic. We discuss potential solutions to these issues, along with implications for developing LLM-supported AFG systems and their adoption in high-stakes settings.