Oral
in
Workshop: International Workshop on Federated Foundation Models in Conjunction with NeurIPS 2024 (FL@FM-NeurIPS'24)
Federated Learning with Generative Content
Rui Ye · Xinyu Zhu · Jingyi Chai · Lingjuan Lyu · Chen Xie · Yanfeng Wang · Siheng Chen
Federated learning (FL) enables leveraging distributed private data for model training in a collaborative and privacy-preserving way. However, the ubiquitous and notorious issue of data heterogeneity, where different data-owing clients hold heterogeneous datasets, significantly and fundamentally limits the performance of current FL methods. To address this issue, this paper explores a new direction, data-centric intervention, which directly enriches the clients’ local data with generative content, fundamentally reducing the level of data heterogeneity. Following this idea, we propose a novel framework, federated learning with generative content (FedGC). FedGC is a simple-yet-effective framework, where each client leverages diverse generative data from advanced generative models and original private data to train its local model, all guided by strategies summarized and learned from our four-aspect analysis. FedGC offers two significant advantages: (1) FedGC significantly mitigates data heterogeneity as the diverse generative data prevents each client from over-fitting its client-specific private data; and (2) FedGC contributes to better privacy preservation as the introduced generative data dilutes the concentration of sensitive data in the enriched dataset, which mitigates the risk of memorizing private information. Empirical studies on 9 baselines and 7 datasets demonstrate that FedGC consistently and significantly improves task performance and privacy preservation.