Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Statistical Frontiers in LLMs and Foundation Models

When is Differentially Private Finetuning Actually Private?

Roy Rinberg · Martin Pawelczyk

Keywords: [ Differential Privacy ] [ Finetuning ] [ LLMs ]

[ ] [ Project Page ]
Sat 14 Dec noon PST — 12:45 p.m. PST

Abstract: Differential Privacy (DP) is a mathematical definition that enshrines a formal guarantee that the output of a query does not depend greatly on any individual in the dataset. Critically, DP does not formalize any notion of "background information" and provides no guarantees about how much an output can be identifying to someone who has background information about an individual. In this work we argue that for the setting of private finetuning Large Language Models (LLMs), where a large model is already trained on data, and finetuning on a private dataset with differential privacy, is not always semantically meaningful. We argue that simply providing a differential privacy $(\epsilon, \delta)$ guarantees is insufficient to provide meaningful human notions of privacy, when the original training is correlated with finetuning dataset. In particular, we argue that alongside a differential privacy guarantee there is for a need to report a measure of dataset similarity and model capacity. This is a work in progress; this work is primarily a position piece, arguing for how DP should be used in practice, and what future research needs to be conducted in order to better answer those questions.

Chat is not available.