Zoom presentation
in
Competition: NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
Invited Speaker: Mojan Javaheripi (Microsoft Research) - Unleashing the power of Small Language Models
Mojan Javaheripi
Over the past few months, we have released a suite of small language models (SLMs) called “Phi” that achieve unprecedented performance on a variety of benchmarks. Our first model, the 1.3 billion parameter Phi-1, achieved state-of-the-art performance on Python coding among SLMs. We then extended our focus to common sense reasoning and language understanding, and created a new 1.3 billion parameter model named Phi-1.5, with performance comparable to models 5x larger. Our latest model, the 2.7 billion parameter Phi-2, surpasses Phi-1.5 performance on all benchmarks, thanks to new innovations in model scaling and training data curation. In this talk, I will introduce Phi SLMs and discuss two key insights driving their performance: 1) generation and utilization of data with "textbook quality" to elevate the learning process in contrast to conventional web data, and 2) incorporation of best practices for scaling up to enhance overall performance.