NeurIPS Evaluating Privacy Risks in Synthetic Clinical Text Generation in Spanish

Poster
in
Affinity Event: LatinX in AI

Evaluating Privacy Risks in Synthetic Clinical Text Generation in Spanish

Luis Miranda · Jocelyn Dunstan · Matías Toro · Federico Olmedo · Felix Melo

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

Leveraging medical data for Deep Learning models holds great potential, but ensuring the protection of sensitive patient information is paramount in the clinical domain. A widely used approach to balance data utility and privacy is the generation of synthetic text with Large Language Models (LLMs) under the framework of differential privacy (DP). Techniques like Differentially Private Stochastic Gradient Descent (DP-SGD) are typically considered to provide privacy guarantees, butthey rely on specific conditions. This research demonstrates how memorization in LLMs can deteriorate when these privacy safeguards are not fully met, increasing the risk of personal and sensitive information being leaked in synthetic clinical reports. Addressing these vulnerabilities could enhance the reliability of DP in protecting clinical text data while maintaining its utility.

Chat is not available.

Poster in Affinity Event: LatinX in AI

Evaluating Privacy Risks in Synthetic Clinical Text Generation in Spanish

Luis Miranda · Jocelyn Dunstan · Matías Toro · Federico Olmedo · Felix Melo

Poster
in
Affinity Event: LatinX in AI