NeurIPS Poster WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena

Poster

WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena

HAIPENG LUO · Qingfeng Sun · Can Xu · Pu Zhao · Qingwei Lin · Jian-Guang Lou · Shifeng Chen · Yansong Tang · Weizhu Chen

East Exhibit Hall A-C #3004

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

Recent work demonstrates that, post-training large language models with open-domain instruction following data have achieved colossal success. Simultaneously, human Chatbot Arena has emerged as one of the most reasonable benchmarks for model evaluation and developmental guidance. However, the processes of manually curating high-quality training data and utilizing online human evaluation platforms are both expensive and limited. To mitigate the manual and temporal costs associated with post-training, this paper introduces a Simulated Chatbot Arena named WizardArena, which is fully based on and powered by open-source LLMs. For evaluation scenario, WizardArena can efficiently predict accurate performance rankings among different models based on offline test set. For training scenario, we simulate arena battles among various state-of-the-art models on a large scale of instruction data, subsequently leveraging the battle results to constantly enhance target model in both the supervised fine-tuning and reinforcement learning . Experimental results demonstrate that our WizardArena aligns closely with the online human arena rankings, and our models trained on offline extensive battle data exhibit significant performance improvements during SFT, DPO, and PPO stages.

Chat is not available.