Skip to yearly menu bar Skip to main content


Poster
in
Affinity Event: LatinX in AI

Evaluating Privacy Risks in Synthetic Clinical Text Generation in Spanish

Luis Miranda · Jocelyn Dunstan · Matías Toro · Federico Olmedo · Felix Melo


Abstract:

Leveraging medical data for Deep Learning models holds great potential, but ensuring the protection of sensitive patient information is paramount in the clinical domain. A widely used approach to balance data utility and privacy is the generation of synthetic text with Large Language Models (LLMs) under the framework of differential privacy (DP). Techniques like Differentially Private Stochastic Gradient Descent (DP-SGD) are typically considered to provide privacy guarantees, butthey rely on specific conditions. This research demonstrates how memorization in LLMs can deteriorate when these privacy safeguards are not fully met, increasing the risk of personal and sensitive information being leaked in synthetic clinical reports. Addressing these vulnerabilities could enhance the reliability of DP in protecting clinical text data while maintaining its utility.

Live content is unavailable. Log in and register to view live content