Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization

Crafting Computational Efficiency for Large Models: Training Recipes, Scaling Strategies and Sparsity Sorcery with Specialized Hardware

Natalia Vassilieva


Abstract:

Abstract: Large models are shifting “what’s possible” with AI. Brute-force scaling of model parameter count increases model capacity, and when presented with enough training data, has shown remarkable results. However, the advantages of large-scale models come at a price of steep increase in system complexity and infrastructure cost. Training and serving these models is an engineering challenge and is very expensive. Even minor errors in model design or training procedure can result in significant waste of resources. At Cerebras we have trained our share of large language models and learned along the way how to train these models efficiently to get “the biggest bang for the buck”. In this talk we will share our experience and insights from training various LLMs. In addition to techniques for compute efficient training of dense models, we will look into benefits of sparse training and inference on Cerebras hardware, designed to take full advantage of all types of sparsity.

Speaker's Bio: Natalia Vassilieva is a Sr. Director of Product at Cerebras Systems, a computer systems company dedicated to accelerating deep learning. She leads the vision and strategy for Cerebras products, market, application, and algorithm analysis for machine learning use cases. Her focus is machine learning and artificial intelligence, analytics, and application-driven software-hardware optimization and co-design. Prior to joining Cerebras, Natalia was a Sr. Research Manager at Hewlett Packard Labs, where she led the Software and AI group and served as the head of HP Labs Russia from 2011 until 2015. Prior to Hewlett Packard, she was an Associate Professor at St. Petersburg State University in Russia and worked as a software engineer for several IT companies. Natalia holds a Ph.D. in computer science from St. Petersburg State University.

Chat is not available.