NeurIPS Automated Feedback Generation for Open-Ended Questions: Insights from Fine-Tuned LLMs

Oral
in
Workshop: The First Workshop on Large Foundation Models for Educational Assessment

Automated Feedback Generation for Open-Ended Questions: Insights from Fine-Tuned LLMs

Elisabetta Mazzullo · Okan Bulut

[ Abstract ] [ Project Page ]

[ OpenReview]

Sun 15 Dec 11:55 a.m. PST — 12:10 p.m. PST

Abstract:

Timely, personalized, and actionable feedback is essential for effective learning but challenging to deliver at scale. Automated feedback generation (AFG) using large language models (LLMs) can be a promising solution to address this challenge. While existing studies using out-of-the-box LLMs and prompting strategies have shown promise, there is room for improvement. This study investigates the fine-tuning of OpenAI’s GPT-3.5-turbo for AFG. We developed feedback for open-ended situational judgment questions, and this small set of hand-crafted feedback examples was used to fine-tune the pre-trained LLM using specific prompting strategies. Our evaluation, conducted by independent judges and test experts, found that the feedback generated by our fine-tuned GPT-3.5-turbo model achieved high user satisfaction (84.8\%) and met key structural quality criteria (72.9\%). Also, the model generalized effectively across different items, providing feedback consistent with instructions, regardless of the respondent's performance level, English proficiency, or student status. However, some feedback statements still contained linguistic errors, lacked focused suggestions, or seemed generic. We discuss potential solutions to these issues, along with implications for developing LLM-supported AFG systems and their adoption in high-stakes settings.

Chat is not available.

Oral in Workshop: The First Workshop on Large Foundation Models for Educational Assessment

Automated Feedback Generation for Open-Ended Questions: Insights from Fine-Tuned LLMs

Elisabetta Mazzullo · Okan Bulut

Oral
in
Workshop: The First Workshop on Large Foundation Models for Educational Assessment