Skip to yearly menu bar Skip to main content


KeyNote Talk
in
Workshop: The Fourth Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV): Highlighting New Architectures for Future Foundation Models

How to build fully open language models: from pre-training to post-training

Hannaneh Hajishirzi

[ ]
Sat 14 Dec 2:30 p.m. PST — 3 p.m. PST

Abstract:

Language models (LMs) have become ubiquitous in both AI research and commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed.

In this talk, I present our OLMo project aimed at building strong language models and making them fully accessible to researchers along with open-source code for data, training, and inference. Training language models are expensive, therefore we optimize for quality vs. compute cost. I focus on how data, architecture, and training improvements advance models at pre-training and post-training stages with less compute cost.

Chat is not available.