Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)

Invited Talk (Jonathan Frankle): Faster Neural Network Training, Algorithmically

Jonathan Frankle


Abstract:

Training modern neural networks is time-consuming, expensive, and energy-intensive. As neural network training costs double every few months, it is difficult for researchers and businesses without immense budgets to keep up, especially as hardware improvements stagnate. In this talk, I will describe my favored approach for managing this challenge: changing the workload itself - the training algorithm. Unlike most workloads in computer science, machine learning is approximate, and we need not worry about changing the underlying algorithm so long as we properly account for the consequences. I will discuss how we have put this approach into practice at MosaicML, including the dozens of algorithmic changes we have studied (which are freely available open source), the science behind how these changes interact with each other (the composition problem), and how we evaluate whether these changes have been effective. I will also detail several surprises we have encountered and lessons we have learned along the way. In the time since we began this work, we have reduced the training times of standard computer vision models by 5-7x and standard language models by 2-3x, and we're just scratching the surface. I will close with a number of open research questions we have encountered that merit the attention of the research community. This is the collective work of a dozen empirical deep learning researchers at MosaicML, and I'm simply the messenger.

Bio: Jonathan Frankle is Chief Scientist at MosaicML, where he leads the company's research team toward the goal of developing more efficient algorithms for training neural networks. In his PhD at MIT, he empirically studied deep learning with Prof. Michael Carbin, specifically the properties of sparse networks that allow them to train effectively (his "Lottery Ticket Hypothesis" - ICLR 2019 Best Paper). In addition to his technical work, he is actively involved in policymaking around challenges related to machine learning. He will be joining the computer science faculty at Harvard in the fall of 2023. He earned his BSE and MSE in computer science at Princeton and has previously spent time at Google Brain, Facebook AI Research, and Microsoft as an intern and Georgetown Law as an Adjunct Professor of Law.

Chat is not available.