Track: Oral 5 Track 1: Unsupervised and Self-supervised learning & Social Aspects of Machine Learning-

Wed 3 May 1:00 - 1:10 PDT

In-Person Oral presentation / top 25% paper

Progress measures for grokking via mechanistic interpretability

Neel Nanda · Lawrence Chan · Tom Lieberum · Jess Smith · Jacob Steinhardt

Neural networks often exhibit emergent behavior in which qualitatively new capabilities that arise from scaling up the number of parameters, training data, or even the number of steps. One approach to understanding emergence is to find the continuous \textit{progress measures} that underlie the seemingly discontinuous qualitative changes. In this work, we argue that progress measures can be found via mechanistic interpretability---that is, by reverse engineering learned models into components and measuring the progress of each component over the course of training. As a case study, we study small transformers trained on a modular arithmetic tasks with emergent grokking behavior. We fully reverse engineer the algorithm learned by these networks, which uses discrete fourier transforms and trigonometric identities to convert addition to rotation about a circle. After confirming the algorithm via ablation, we then use our understanding of the algorithm to define progress measures that precede the grokking phase transition on this task. We see our result as demonstrating both that it is possible to fully reverse engineer trained networks, and that doing so can be invaluable to understanding their training dynamics.

Wed 3 May 1:10 - 1:20 PDT

In-Person Oral presentation / top 25% paper

Localized Randomized Smoothing for Collective Robustness Certification

Jan Schuchardt · Tom Wollschläger · Aleksandar Bojchevski · Stephan Günnemann

Models for image segmentation, node classification and many other tasks map a single input to multiple labels. By perturbing this single shared input (e.g. the image) an adversary can manipulate several predictions (e.g. misclassify several pixels). Collective robustness certification is the task of provably bounding the number of robust predictions under this threat model. The only dedicated method that goes beyond certifying each output independently is limited to strictly local models, where each prediction is associated with a small receptive field. We propose a more general collective robustness certificate for all types of models. We further show that this approach is beneficial for the larger class of softly local models, where each output is dependent on the entire input but assigns different levels of importance to different input regions (e.g. based on their proximity in the image). The certificate is based on our novel localized randomized smoothing approach, where the random perturbation strength for different input regions is proportional to their importance for the outputs. Localized smoothing Pareto-dominates existing certificates on both image segmentation and node classification tasks, simultaneously offering higher accuracy and stronger certificates.

Wed 3 May 1:20 - 1:30 PDT

In-Person Oral presentation / top 25% paper

Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes

Eoin Kenny · Mycal Tucker · Julie Shah

Despite recent success of deep learning models in research settings, their application in sensitive domains remains limited because of their opaque decision-making processes. Taking to this challenge, people have proposed various eXplainable AI (XAI) techniques designed to calibrate trust and understandability of black-box models, with the vast majority of work focused on supervised learning. Here, we focus on making an "interpretable-by-design" deep reinforcement learning agent which is forced to use human-friendly prototypes in its decisions, thus making its reasoning process clear. Our proposed method, dubbed Prototype-Wrapper Network (PW-Net), wraps around any neural agent backbone, and results indicate that it does not worsen performance relative to black-box models. Most importantly, we found in a user study that PW-Nets supported better trust calibration and task performance relative to standard interpretability approaches and black-boxes.

Wed 3 May 1:30 - 1:40 PDT

In-Person Oral presentation / top 25% paper

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks

Tuomas Oikarinen · Tsui-Wei Weng

In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. CLIP-Dissect leverages recent advances in multimodal vision/language models to label internal neurons with open-ended concepts without the need for any labeled data or human examples. We show that CLIP-Dissect provides more accurate descriptions than existing methods for last layer neurons where the ground-truth is available as well as qualitatively good descriptions for hidden layer neurons. In addition, our method is very flexible: it is model agnostic, can easily handle new concepts and can be extended to take advantage of better multimodal models in the future. Finally CLIP-Dissect is computationally efficient and can label all neurons from five layers of ResNet-50 in just 4 minutes, which is more than 10$\times$ faster than existing methods. Our code is available at https://github.com/Trustworthy-ML-Lab/CLIP-dissect.

Wed 3 May 1:40 - 1:50 PDT

In-Person Oral presentation / top 25% paper

Model-based Causal Bayesian Optimization

Scott Sussex · Anastasia Makarova · Andreas Krause

How should we intervene on an unknown structural equation model to maximize a downstream variable of interest? This setting, also known as causal Bayesian optimization (CBO), has important applications in medicine, ecology, and manufacturing. Standard Bayesian optimization algorithms fail to effectively leverage the underlying causal structure. Existing CBO approaches assume noiseless measurements and do not come with guarantees. We propose the {\em model-based causal Bayesian optimization algorithm (MCBO)} that learns a full system model instead of only modeling intervention-reward pairs. MCBO propagates epistemic uncertainty about the causal mechanisms through the graph and trades off exploration and exploitation via the optimism principle. We bound its cumulative regret, and obtain the first non-asymptotic bounds for CBO. Unlike in standard Bayesian optimization, our acquisition function cannot be evaluated in closed form, so we show how the reparameterization trick can be used to apply gradient-based optimizers. The resulting practical implementation of MCBO compares favorably with state-of-the-art approaches empirically.

Wed 3 May 1:50 - 2:00 PDT

In-Person Oral presentation / top 25% paper

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

Yuxin Fang · Li Dong · Hangbo Bao · Xinggang Wang · Furu Wei

We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training. CIM uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of using artificial [MASK] tokens, where some patches are randomly selected and replaced with plausible alternatives sampled from the BEiT output distribution. Given this corrupted image, an enhancer network learns to either recover all the original image pixels, or predict whether each visual token is replaced by a generator sample or not. The generator and the enhancer are simultaneously trained and synergistically updated. After pre-training, the enhancer can be used as a high-capacity visual encoder for downstream tasks. CIM is a general and flexible visual pre-training framework that is suitable for various network architectures. For the first time, CIM demonstrates that both ViT and CNN can learn rich visual representations using a unified, non-Siamese framework. Experimental results show that our approach achieves compelling results in vision benchmarks, such as ImageNet classification and ADE20K semantic segmentation.

Wed 3 May 2:00 - 2:10 PDT

In-Person Oral presentation / top 5% paper

SimPer: Simple Self-Supervised Learning of Periodic Targets

Yuzhe Yang · Xin Liu · Jiang Wu · Silviu Borac · Dina Katabi · Ming-Zher Poh · Daniel McDuff

From human physiology to environmental evolution, important processes in nature often exhibit meaningful and strong periodic or quasi-periodic changes. Due to their inherent label scarcity, learning useful representations for periodic tasks with limited or no supervision is of great benefit. Yet, existing self-supervised learning (SSL) methods overlook the intrinsic periodicity in data, and fail to learn representations that capture periodic or frequency attributes. In this paper, we present SimPer, a simple contrastive SSL regime for learning periodic information in data. To exploit the periodic inductive bias, SimPer introduces customized augmentations, feature similarity measures, and a generalized contrastive loss for learning efficient and robust periodic representations. Extensive experiments on common real-world tasks in human behavior analysis, environmental sensing, and healthcare domains verify the superior performance of SimPer compared to state-of-the-art SSL methods, highlighting its intriguing properties including better data efficiency, robustness to spurious correlations, and generalization to distribution shifts.

Wed 3 May 2:10 - 2:20 PDT

In-Person Oral presentation / top 25% paper

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification

Samuel Lavoie · Christos Tsirigotis · Max Schwarzer · Ankit Vani · Mikhail Noukhovitch · Kenji Kawaguchi · Aaron Courville

Simplicial Embeddings (SEM) are representations learned through self-supervised learning (SSL), wherein a representation is projected into $L$ simplices of $V$ dimensions each using a \texttt{softmax} operation. This procedure conditions the representation onto a constrained space during pretraining and imparts an inductive bias for group sparsity. For downstream classification, we formally prove that the SEM representation leads to better generalization than an unnormalized representation.Furthermore, we empirically demonstrate that SSL methods trained with SEMs have improved generalization on natural image datasets such as CIFAR-100 and ImageNet. Finally, when used in a downstream classification task, we show that SEM features exhibit emergent semantic coherence where small groups of learned features are distinctly predictive of semantically-relevant classes.

Main Navigation

Session

Oral 5 Track 1: Unsupervised and Self-supervised learning & Social Aspects of Machine Learning-

AD11

Progress measures for grokking via mechanistic interpretability

Localized Randomized Smoothing for Collective Robustness Certification

Towards Interpretable Deep Reinforcement Learning with Human-Friendly Prototypes

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks

Model-based Causal Bayesian Optimization

Corrupted Image Modeling for Self-Supervised Visual Pre-Training

SimPer: Simple Self-Supervised Learning of Periodic Targets

Simplicial Embeddings in Self-Supervised Learning and Downstream Classification