Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models
Conversational Question-Answering for process task guidance in manufacturing
Ramesh Manuvinakurike · Elizabeth Watkins · Celal Savur · Anthony Rhodes · Sovan Biswas · Richard Beckwith · Gesem Mejia · Saurav Sahay · Giuseppe Raffa · Lama Nachman
Keywords: [ Judge ] [ Augmentation ] [ Evaluation ]
In this work we explore utilizing LLMs for data augmentation for manufacturing task guidance system. The dataset consists of representative samples of interactions with technicians working in an advanced manufacturing setting. The purpose of this work to explore the task, data augmentation for the supported tasks and evaluating the performance of the existing LLMs. We observe that that task is complex requiring understanding from procedure specification documents, actions and objects sequenced temporally. The dataset consists of 200,000+ question/answer pairs that refer to the spec document and are grounded in narrations and/or video demonstrations. We compared the performance of several popular open-sourced LLMs by developing a “baseline” using each LLM and then compared the responses in a reference-free setting using LLM-as-a-judge and compared the ratings with crowd-workers whilst validating the ratings with experts.