Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models

vTune: Verifiable fine-tuning Through Backdooring

Eva Zhang · Akilesh Potti · Micah Goldblum

Keywords: [ auto-eval ] [ statistical measure ] [ safety ] [ verification ] [ backdoor ] [ audit ] [ fine-tuning ]

[ ] [ Project Page ]
Sat 14 Dec 3:45 p.m. PST — 4:30 p.m. PST

Abstract:

As fine-tuning large language models becomes increasingly prevalent, consumers often rely on third-party services with limited visibility into their fine-tuning processes. This lack of transparency raises the question: how do consumers verify thatfine-tuning services are performed correctly? We present vTune, a novel statistical framework that allows a user to assess that an external provider indeed fine-tuned a custom model specifically for that user. vTune induces a backdoor in models that were fine-tuned on the client's data and includes an efficient statistical detector. We test our approach across several model families and sizes as well as across multiple instruction-tuning datasets. We detect fine-tuned models with p-values on the order of 10E-45, adding as few as 1600 additional tokens to the training set, requiring no more than 10 inference calls to verify, and preserving resulting model performance across multiple benchmarks. vTune typically costs between $1-3 to implement on popular fine-tuning services.

Chat is not available.