Workshop
Mathematical and Empirical Understanding of Foundation Models (ME-FoMo)
Ananya Kumar · Tengyu Ma · Tiffany Vlaar · Aditi Raghunathan · Hanie Sedghi · Yamini Bansal · Sang Michael Xie · Percy Liang · Mathilde Caron
AD10
Thu 4 May, 12:15 a.m. PDT
Foundation models (FMs) are models that are trained on a large and diverse pool of data and can be adapted to a wide range of tasks. Recent examples of FMs include large language models (GPT-3, BERT, PaLM), image representation encoders (SimCLR), and image-text models (CLIP, DALL-E), which have all revolutionized the way models are built in their domains. Foundation models are poorly understood: the core driving principle behind Foundation Models (FMs) is transfer learning, but scale and modern self supervision techniques have led to emergent capabilities we might not have anticipated. The goal of this workshop is to highlight research that aims to improve our understanding of FMs. We liberally interpret understanding as any research ranging from purely empirical papers that highlight interesting phenomena, to those which attempt to explain or provide theoretical foundations for such phenomena in potentially simplified settings.
Schedule
Thu 12:15 a.m. - 12:45 a.m.
|
Invited Talk (Yann Dauphin): Leveraging Multiple Models and Multiple Tasks
(
Invited Talk
)
>
SlidesLive Video |
Yann Dauphin 🔗 |
Thu 12:45 a.m. - 12:50 a.m.
|
Q&A
(
Q&A
)
>
|
🔗 |
Thu 12:50 a.m. - 1:20 a.m.
|
Invited Talk (Jared Kaplan): AI Safety, RLHF, and Self-Supervision
(
Invited Talk
)
>
SlidesLive Video |
Jared Kaplan 🔗 |
Thu 1:20 a.m. - 1:25 a.m.
|
Q&A
(
Q&A
)
>
|
🔗 |
Thu 1:25 a.m. - 1:35 a.m.
|
Coffee Break
|
🔗 |
Thu 1:35 a.m. - 2:05 a.m.
|
Invited Talk (Lenka Zdeborová): Insights from exactly solvable high-dimensional models
(
Invited Talk
)
>
SlidesLive Video |
Lenka Zdeborova 🔗 |
Thu 2:05 a.m. - 2:10 a.m.
|
Q&A
(
Q&A
)
>
|
🔗 |
Thu 2:10 a.m. - 2:40 a.m.
|
Invited Talk (Sanjeev Arora): Task-specific Skill Localization in Fine-tuned Language Models
(
Invited Talk
)
>
SlidesLive Video |
Sanjeev Arora 🔗 |
Thu 2:40 a.m. - 2:45 a.m.
|
Q&A
(
Q&A
)
>
|
🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Accelerating Neural Self-Improvement via Bootstrapping ( Poster ) > link | Kazuki Irie · Jürgen Schmidhuber 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Mini-Batch Optimization of Contrastive Loss ( Poster ) > link | Kartik Sreenivasan · Keon Lee · Jeong-Gwan Lee · Anna Lee · Jaewoong Cho · Jy-yong Sohn · Dimitris Papailiopoulos · Kangwook Lee 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
On the Role of Attention in Prompt-tuning ( Poster ) > link | Samet Oymak · Ankit Singh Rawat · Mahdi Soltanolkotabi · Christos Thrampoulidis 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
LOOPED TRANSFORMERS AS PROGRAMMABLE COMPUTERS ( Poster ) > link | Angeliki Giannou · Shashank Rajput · Jy-yong Sohn · Kangwook Lee · Jason Lee · Dimitris Papailiopoulos 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Diffusion Models are Minimax Optimal Distribution Estimators ( Poster ) > link | Kazusato Oko · Akiyama Shunta · Taiji Suzuki 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
The Effects of Pretraining Task Diversity on In-Context Learning of Ridge Regression ( Poster ) > link | Allan Raventos · Mansheej Paul · Feng Chen · Surya Ganguli 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Conservative Prediction via Transductive Confidence Minimization ( Poster ) > link | Caroline Choi · Fahim Tajwar · Yoonho Lee · Huaxiu Yao · Ananya Kumar · Chelsea Finn 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Controlled assessment of CLIP-style language-aligned vision models in prediction of brain & behavioral data ( Poster ) > link | Colin Conwell · Jacob Prince · Christopher Hamblin · George Alvarez 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
The Independent Compositional Subspace Hypothesis for the Structure of CLIP's Last Layer ( Poster ) > link | Max Wolff · Wieland Brendel · Stuart Wolff 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Exploring Demonstration Ensembling for In-context Learning ( Poster ) > link | Muhammad Khalifa · Lajanugen Logeswaran · Moontae Lee · Honglak Lee · Lu Wang 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks ( Poster ) > link | William Merrill · Nikolaos Tsilivis · Aman Shukla 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
A Comprehensive Benchmark of Human-Like Relational Reasoning for Text-to-Image Foundation Models ( Poster ) > link | Colin Conwell · Tomer Ullman 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations ( Poster ) > link | Shashank Shekhar · Florian Bordes · Pascal Vincent · Ari Morcos 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Robustness of edited neural networks ( Poster ) > link | Davis Brown · Charles Godfrey · Cody Nizinski · Jonathan Tu · Henry Kvinge 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension ( Poster ) > link | Henry Kvinge · Davis Brown · Charles Godfrey 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters ( Poster ) > link | Boshi Wang · Sewon Min · Xiang Deng · Jiaming Shen · You Wu · Luke Zettlemoyer · Huan Sun 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
SemDeDup: Data-efficient learning at web-scale through semantic deduplication ( Poster ) > link | Amro Kamal · Kushal Tirumala · Daniel Simig · Surya Ganguli · Ari Morcos 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Effective Data Augmentation With Diffusion Models ( Poster ) > link | Brandon Trabucco · Kyle Doherty · Max Gurinas · Ruslan Salakhutdinov 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Text-to-Image Diffusion Models are Zero-Shot Classifiers ( Poster ) > link | Kevin Clark · Priyank Jaini 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Simple Hardware-Efficient Long Convolutions for Sequence Modeling ( Poster ) > link | Dan Fu · Elliot Epstein · Eric Nguyen · Armin Thomas · Michael Zhang · Tri Dao · Atri Rudra · Christopher Re 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Understanding HTML with Large Language Models ( Poster ) > link | Izzeddin Gur · Ofir Nachum · Yingjie Miao · Mustafa Safdari · Austin Huang · Aakanksha Chowdhery · SHARAN NARANG · Noah Fiedel · Aleksandra Faust 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Instruction-Finetuned Foundation Models for Multimodal Web Navigation ( Poster ) > link | Hiroki Furuta · Ofir Nachum · Kuang-Huei Lee · Yutaka Matsuo · Shixiang Gu · Izzeddin Gur 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Out-of-context Meta-learning in Large Language Models ( Poster ) > link | Dmitrii Krasheninnikov · Egor Krasheninnikov · David Krueger 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
What Happens to the Source Domain in Transfer Learning? ( Poster ) > link | Amal Alnouri · Bilal Alsallakh 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Modality-Aware Adaptation of Contrastive Language-Image Models ( Poster ) > link | Alexander Long · Thalaiyasingam Ajanthan · Anton Hengel 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns ( Poster ) > link | Soma Onishi · Kenta Oono · Kohei Hayashi 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Do Video-Language Foundation Models have a Sense of Time? ( Poster ) > link | Piyush Nitin Bagad · Makarand Tapaswi · Cees G Snoek 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
What Contrastive Learning Learns Beyond Class-wise Features? ( Poster ) > link | Xingyuming Liu · Yifei Wang · Yisen Wang 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Look Globally and Locally: Inter-Intra Contrastive Learning from Unlabeled Videos ( Poster ) > link | David Fan · Deyu Yang · Xinyu Li · Vimal Bhat · Rohith MV 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Improving Foundation Models for Few-Shot Learning via Multitask Finetuning ( Poster ) > link | Zhuoyan Xu · Zhenmei Shi · Junyi Wei · Yin Li · Yingyu Liang 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
A Kernel-Based View of Language Model Fine-Tuning ( Poster ) > link | Sadhika Malladi · Alexander Wettig · Dingli Yu · Danqi Chen · Sanjeev Arora 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Variable Discretization for Self-Supervised Learning ( Poster ) > link | Chuang Niu · Wenjun Xia · Ge Wang 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Empirical Analysis of the Strengths and Weaknesses of PEFT Techniques for LLMs ( Poster ) > link | George Pu · Anirudh Jain · Jihan Yin · Russell Kaplan 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
AWE: Adaptive weight-space ensembling for few-shot fine-tuning ( Poster ) > link | Jean-Christophe Gagnon-Audet · David J Schwab · Ricardo Monti 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations ( Poster ) > link | Xinxi Lyu · Sewon Min · Iz Beltagy · Luke Zettlemoyer · Hannaneh Hajishirzi 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Variational prompt tuning improves generalization of vision-language foundation models ( Poster ) > link | Mohammad Mahdi Derakhshani · Enrique Sanchez · Adrian Bulat · Victor Guilherme Turrisi da Costa · Cees G Snoek · Georgios Tzimiropoulos · Brais Martinez 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Aligning Foundation Models for Language with Preferences through $f$-divergence Minimization ( Poster ) > link | Dongyoung Go · Tomek Korbak · Germàn Kruszewski · Jos Rozen · Nahyeon Ryu · Marc Dymetman 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
The SSL Interplay: Augmentations, Inductive Bias, and Generalization ( Poster ) > link | Vivien Cabannes · Bobak Kiani · Randall Balestriero · Yann LeCun · Alberto Bietti 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Retrieval of Soft Prompt Enhances Zero-Shot Task Generalization ( Poster ) > link | Seonghyeon Ye · Joel Jang · Doyoung Kim · Yongrae Jo · Minjoon Seo 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners ( Poster ) > link | Seonghyeon Ye · Doyoung Kim · Joel Jang · Joongbo Shin · Minjoon Seo 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Project with Source, Probe with Target: Extracting Useful Features for Adaptation to Distribution Shifts ( Poster ) > link | Annie Chen · Yoonho Lee · Amrith Setlur · Sergey Levine · Chelsea Finn 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers ( Poster ) > link | Damai Dai · Yutao Sun · Li Dong · Yaru Hao · Shuming Ma · Zhifang Sui · Furu Wei 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Towards Foundation Models with Mathematical Understanding ( Poster ) > link | Peter Belcak · Roger Wattenhofer 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Principled Reinforcement Learning with Human Feedback from Pairwise or $K$-wise Comparisons ( Poster ) > link | Banghua Zhu · Jiantao Jiao · Michael Jordan 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Broken Neural Scaling Laws ( Poster ) > link | Ethan Caballero · Kshitij Gupta · Irina Rish · David Krueger 🔗 |
Thu 4:00 a.m. - 5:00 a.m.
|
Coordinating Multiple Vision-Language Models for Visual Reasoning ( Poster ) > link | Liangyu Chen · Bo Li · Sheng Shen · Jingkang Yang · Chunyuan Li · Kurt Keutzer · trevor darrell · Ziwei Liu 🔗 |
Thu 5:00 a.m. - 5:05 a.m.
|
Diffusion Models are Minimax Optimal Distribution Estimators
(
Spotlight
)
>
link
SlidesLive Video |
Kazusato Oko · Akiyama Shunta · Taiji Suzuki 🔗 |
Thu 5:08 a.m. - 5:13 a.m.
|
Text-to-Image Diffusion Models are Zero-Shot Classifiers
(
Spotlight
)
>
link
SlidesLive Video |
Kevin Clark · Priyank Jaini 🔗 |
Thu 5:16 a.m. - 5:21 a.m.
|
Exploring Demonstration Ensembling for In-context Learning
(
Spotlight
)
>
link
SlidesLive Video |
Muhammad Khalifa · Lajanugen Logeswaran · Moontae Lee · Honglak Lee · Lu Wang 🔗 |
Thu 5:24 a.m. - 5:29 a.m.
|
Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations
(
Spotlight
)
>
link
SlidesLive Video |
Shashank Shekhar · Florian Bordes · Pascal Vincent · Ari Morcos 🔗 |
Thu 5:32 a.m. - 5:37 a.m.
|
Effective Data Augmentation With Diffusion Models
(
Spotlight
)
>
link
SlidesLive Video |
Brandon Trabucco · Kyle Doherty · Max Gurinas · Ruslan Salakhutdinov 🔗 |
Thu 5:40 a.m. - 5:45 a.m.
|
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
(
Spotlight
)
>
link
SlidesLive Video |
Seonghyeon Ye · Doyoung Kim · Joel Jang · Joongbo Shin · Minjoon Seo 🔗 |
Thu 5:48 a.m. - 5:53 a.m.
|
A Kernel-Based View of Language Model Fine-Tuning
(
Spotlight
)
>
link
SlidesLive Video |
Sadhika Malladi · Alexander Wettig · Dingli Yu · Danqi Chen · Sanjeev Arora 🔗 |
Thu 6:00 a.m. - 6:30 a.m.
|
Invited Talk (Yasaman Bahri): Understanding Neural Scaling Laws
(
Invited Talk
)
>
SlidesLive Video |
Yasaman Bahri 🔗 |
Thu 6:30 a.m. - 6:35 a.m.
|
Q&A
(
Q&A
)
>
|
🔗 |
Thu 6:35 a.m. - 7:05 a.m.
|
Invited Talk (Danqi Chen): Analyzing Training Objectives and Trajectories in Language Pre-training
(
Invited Talk
)
>
SlidesLive Video |
Danqi Chen 🔗 |
Thu 7:05 a.m. - 7:10 a.m.
|
Q&A
(
Q&A
)
>
|
🔗 |
Thu 7:10 a.m. - 7:40 a.m.
|
Invited Talk (Jonathan Frankle): Faster Neural Network Training, Algorithmically
(
Invited Talk
)
>
|
Jonathan Frankle 🔗 |
Thu 7:40 a.m. - 7:45 a.m.
|
Q&A
(
Q&A
)
>
|
🔗 |