Poster
in
Affinity Event: LatinX in AI
Evaluating Privacy Risks in Synthetic Clinical Text Generation in Spanish
Luis Miranda · Jocelyn Dunstan · MatÃas Toro · Federico Olmedo · Felix Melo
Leveraging medical data for Deep Learning models holds great potential, but ensuring the protection of sensitive patient information is paramount in the clinical domain. A widely used approach to balance data utility and privacy is the generation of synthetic text with Large Language Models (LLMs) under the framework of differential privacy (DP). Techniques like Differentially Private Stochastic Gradient Descent (DP-SGD) are typically considered to provide privacy guarantees, butthey rely on specific conditions. This research demonstrates how memorization in LLMs can deteriorate when these privacy safeguards are not fully met, increasing the risk of personal and sensitive information being leaked in synthetic clinical reports. Addressing these vulnerabilities could enhance the reliability of DP in protecting clinical text data while maintaining its utility.
Live content is unavailable. Log in and register to view live content