Skip to yearly menu bar Skip to main content


Invited Talk
in
Workshop: Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)

Invited Talk (Danqi Chen): Analyzing Training Objectives and Trajectories in Language Pre-training

Danqi Chen


Abstract:

In this talk, I will present several empirical studies on understanding and analyzing pre-training of language models. I will start with BERT’s pre-training/fine-tuning paradigm, and discuss how pre-training objectives will influence downstream performance. Then, I will move on to the scaling of autoregressive large language models. Through analyzing intermediate training checkpoints, we present several interesting findings on token-level perplexity, sentence-level generation and their correlation with in-context learning on downstream tasks. I hope these findings can encourage more theoretical understanding and improved pre-training in the future.

Bio: Danqi Chen is an Assistant Professor of Computer Science at Princeton University and co-leads the Princeton NLP Group. Her recent research focuses on training, adapting and understanding large language models, and developing scalable and efficient NLP systems for question answering, information extraction and conversational agents. Before joining Princeton, Danqi worked as a visiting scientist at Facebook AI Research. She received her Ph.D. from Stanford University (2018) and B.E. from Tsinghua University (2012), both in Computer Science. Her research was recognized by a Sloan Fellowship, an NSF CAREER award, a Samsung AI Researcher of the Year award, outstanding paper awards from ACL and EMNLP, and multiple industry faculty awards.

Chat is not available.