Talk / Overview

Training modern neural networks is time-consuming, expensive, and energy-intensive. As neural network training costs increase exponentially, it is difficult for researchers and businesses without immense budgets to keep up, especially as hardware improvements stagnate. In this talk, I will describe my favored approach for managing this challenge: changing the workload itself - the training algorithm. Unlike most workloads in computer science, machine learning is approximate, and we need not worry about changing the underlying algorithm so long as we properly account for the consequences. I will discuss how we have put this approach into practice at MosaicML, including the dozens of algorithmic changes we have studied (which are freely available open source), the science behind how these changes interact with each other (the composition problem), and how we evaluate whether these changes have been effective. I will also detail several surprises we have encountered and lessons we have learned along the way. In the time since we began this work, we have reduced the training times of standard models like ResNet-50, Stable Diffusion, and GPT-3 by 5x-10x, and we're just scratching the surface. I will close with a number of open research questions we have encountered that merit the attention of the research community. This is the collective work of many empirical deep learning researchers at MosaicML, and I'm simply the messenger.

Talk / Speakers

Jonathan Frankle

Chief Scientist, MosaicML

AMLD / Global partners