Poster
in
Workshop: ML with New Compute Paradigms
Regularizing the Infinite: Improved Generalization Performance with Deep Equilibrium Models
Babak Rahmani · Jannes Gladrow · Kirill Kalinin · Heiner Kremer · Christos Gkantsidis · Hitesh Ballani
Implicit networks, such as Deep Equilibrium (DEQ) models, present unique opportunities for emerging computing paradigms. Unlike traditional feedforward (FFW) networks, DEQs adaptively adjust their compute resources which has been shown to improve out-of-distribution generalization, especially in algorithmic tasks. We demonstrate that this generalization includes robustness to noise making them well-suited for new hardware, such as analog or optical architectures, with higher yet noisy compute capabilities. But do DEQ models consistently outperform FFW networks in generalization? Surprisingly, our findings indicate that this advantage depends heavily on the specific task and network architecture. For equivalent network capacity, DEQ models prove more beneficial as the depth of the network increases—a trend that aligns with hardware systems optimized for deeper networks. We further demonstrate that regularizing the DEQ’s entire dynamic process, instead of random initialization or threshold prescribed in previous work, significantly enhances performance across various tasks, including image classification, function regression, adversarial training, and algorithmic extrapolation, making DEQs a compelling choice for next-generation hardware systems.