Talk
in
Workshop: Decentralization and Trustworthy Machine Learning in Web3: Methodologies, Platforms, and Applications
Invited Talk: Virginia Smith - Practical Approaches for Private Adaptive Optimization
Adaptive optimizers (e.g., AdaGrad, Adam) are widely used in machine learning. Despite their success in non-private training, the benefits of adaptivity tend to degrade when training with differential privacy for applications such as federated learning. We explore two simple techniques to improve the performance of private adaptive optimizers. First, we study the use of side information as a way to precondition gradients and effectively approximate gradient geometry. In cases where such side information is not available, we then propose differentially private adaptive training with delayed preconditioners (DP^2), a simple method that constructs delayed but less noisy preconditioners to realize the benefits of adaptivity. We analyze both approaches in theory and in practice, showing that these practical techniques can allow for many of the benefits lost when applying state-of-the-art optimizers in private settings to be regained.