KeyNote Talk
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models
Optimizing Data Use for Efficient Pre-training
Danqi Chen
Abstract:
Training large language models relies heavily on the quality and composition of data, yet optimizing data selection and utilization remains a significant challenge in the field. In this talk, I will outline several key ideas to enhance training efficiency through better data use and cover several findings from my lab on selecting high-quality datasets and optimizing data compositions. I will also introduce a simple yet powerful pre-training approach that conditions on meta-data information associated with training data. This approach is remarkably straightforward to implement, incurs minimal computational overhead, and yields significant efficiency gains.
Chat is not available.