ICLR 2021 Papers

Skip to yearly menu bar Skip to main content

mini compact topic detail

UMEC: Unified model and embedding compression for efficient recommendation systems

Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification

ResNet After All: Neural ODEs and Their Numerical Solution

Deep Partition Aggregation: Provable Defenses against General Poisoning Attacks

MetaNorm: Learning to Normalize Few-Shot Batches Across Domains

Fidelity-based Deep Adiabatic Scheduling

Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching

CPT: Efficient Deep Neural Network Training via Cyclic Precision

Tilted Empirical Risk Minimization

Improved Estimation of Concentration Under $\ell_p$-Norm Distance Metrics Using Half Spaces

ALFWorld: Aligning Text and Embodied Environments for Interactive Learning

SEDONA: Search for Decoupled Neural Networks toward Greedy Block-wise Learning

Learning to Generate 3D Shapes with Generative Cellular Automata

Sliced Kernelized Stein Discrepancy

Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation

VA-RED$^2$: Video Adaptive Redundancy Reduction

On InstaHide, Phase Retrieval, and Sparse Matrix Factorization

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Statistical inference for individual fairness

Emergent Symbols through Binding in External Memory

Early Stopping in Deep Networks: Double Descent and How to Eliminate it

On the Universality of the Double Descent Peak in Ridgeless Regression

Evaluation of Neural Architectures Trained With Square Loss vs Cross-Entropy in Classification Tasks

Explainable Subgraph Reasoning for Forecasting on Temporal Knowledge Graphs

Simple Spectral Graph Convolution

PolarNet: Learning to Optimize Polar Keypoints for Keypoint Based Object Detection

Deconstructing the Regularization of BatchNorm

RMSprop converges with proper hyper-parameter

Generative Scene Graph Networks

Learnable Embedding sizes for Recommender Systems

Overfitting for Fun and Profit: Instance-Adaptive Data Compression

Acting in Delayed Environments with Non-Stationary Markov Policies

ARMOURED: Adversarially Robust MOdels using Unlabeled data by REgularizing Diversity

Solving Compositional Reinforcement Learning Problems via Task Reduction

A Geometric Analysis of Deep Generative Image Models and Its Applications

A Discriminative Gaussian Mixture Model with Sparsity

Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels

Certify or Predict: Boosting Certified Robustness with Compositional Architectures

Taming GANs with Lookahead-Minmax

Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis

Learning Subgoal Representations with Slow Dynamics

GAN2GAN: Generative Noise Learning for Blind Denoising with Single Noisy Images

CO2: Consistent Contrast for Unsupervised Visual Representation Learning

CPR: Classifier-Projection Regularization for Continual Learning

MARS: Markov Molecular Sampling for Multi-objective Drug Discovery

Fooling a Complete Neural Network Verifier

Representation Learning via Invariant Causal Mechanisms

Interpreting and Boosting Dropout from a Game-Theoretic View

Quantifying Differences in Reward Functions

BOIL: Towards Representation Change for Few-shot Learning

Generating Adversarial Computer Programs using Optimized Obfuscations

On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis

Stabilized Medical Image Attacks

FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization

Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections

Memory Optimization for Deep Networks

Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering

Identifying nonlinear dynamical systems with multiple time scales and long-range dependencies

Hyperbolic Neural Networks++

Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies

Spatially Structured Recurrent Modules

Teaching Temporal Logics to Neural Networks

Self-supervised Visual Reinforcement Learning with Object-centric Representations

On Self-Supervised Image Representations for GAN Evaluation

Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time

Reset-Free Lifelong Learning with Skill-Space Planning

Fast Geometric Projections for Local Robustness Certification

TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks

Adapting to Reward Progressivity via Spectral Reinforcement Learning

Decentralized Attribution of Generative Models

Combining Physics and Machine Learning for Network Flow Estimation

Large Batch Simulation for Deep Reinforcement Learning

Byzantine-Resilient Non-Convex Stochastic Gradient Descent

Accurate Learning of Graph Representations with Graph Multiset Pooling

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning

HyperDynamics: Meta-Learning Object and Agent Dynamics with Hypernetworks

Anytime Sampling for Autoregressive Models via Ordered Autoencoding

Disentangling 3D Prototypical Networks for Few-Shot Concept Learning

Iterated learning for emergent systematicity in VQA

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning

Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains

Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?

Learning to Make Decisions via Submodular Regularization

Adaptive and Generative Zero-Shot Learning

CompOFA – Compound Once-For-All Networks for Faster Multi-Platform Deployment

LambdaNetworks: Modeling long-range Interactions without Attention

Orthogonalizing Convolutional Layers with the Cayley Transform

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

Gradient Projection Memory for Continual Learning

More or Less: When and How to Build Convolutional Neural Network Ensembles

Efficient Empowerment Estimation for Unsupervised Stabilization

A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference

Combining Ensembles and Data Augmentation Can Harm Your Calibration

Fourier Neural Operator for Parametric Partial Differential Equations

Saliency is a Possible Red Herring When Diagnosing Poor Generalization

Provably robust classification of adversarial examples with detection

SAFENet: A Secure, Accurate and Fast Neural Network Inference

MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training

Improved Autoregressive Modeling with Distribution Smoothing

Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning

Combining Label Propagation and Simple Models out-performs Graph Neural Networks

Local Search Algorithms for Rank-Constrained Convex Optimization

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

Decoupling Global and Local Representations via Invertible Generative Flows

SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing

Evaluating the Disentanglement of Deep Generative Models through Manifold Topology

End-to-End Egospheric Spatial Memory

Neural Approximate Sufficient Statistics for Implicit Models

Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification

BREEDS: Benchmarks for Subpopulation Shift

PAC Confidence Predictions for Deep Neural Network Classifiers

Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

Hopper: Multi-hop Transformer for Spatiotemporal Reasoning

Why resampling outperforms reweighting for correcting sampling bias with stochastic gradients

In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning

Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms

AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition

Dataset Meta-Learning from Kernel Ridge-Regression

Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning

On Position Embeddings in BERT

Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions

Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy

Uncertainty Sets for Image Classifiers using Conformal Prediction

Conditional Negative Sampling for Contrastive Learning of Visual Representations

Faster Binary Embeddings for Preserving Euclidean Distances

Model-Based Offline Planning

Neural Networks for Learning Counterfactual G-Invariances from Single Environments

Learning Energy-Based Models by Diffusion Recovery Likelihood

QPLEX: Duplex Dueling Multi-Agent Q-Learning

Does enhanced shape bias improve neural network robustness to common corruptions?

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning

PDE-Driven Spatiotemporal Disentanglement

Mapping the Timescale Organization of Neural Language Models

The Intrinsic Dimension of Images and Its Impact on Learning

Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

Long Range Arena : A Benchmark for Efficient Transformers

Recurrent Independent Mechanisms

Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering

SALD: Sign Agnostic Learning with Derivatives

WaveGrad: Estimating Gradients for Waveform Generation

Linear Last-iterate Convergence in Constrained Saddle-point Optimization

Go with the flow: Adaptive control for Neural ODEs

Understanding Over-parameterization in Generative Adversarial Networks

Multiscale Score Matching for Out-of-Distribution Detection

Random Feature Attention

Tradeoffs in Data Augmentation: An Empirical Study

Rapid Task-Solving in Novel Environments

Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers

Unsupervised Discovery of 3D Physical Objects

Sample-Efficient Automated Deep Reinforcement Learning

Learning Structural Edits via Incremental Tree Transformations

Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis

Practical Real Time Recurrent Learning with a Sparse Approximation

Private Post-GAN Boosting

Modeling the Second Player in Distributionally Robust Optimization

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

Representation learning for improved interpretability and classification accuracy of clinical factors from EEG

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

Multi-Level Local SGD: Distributed SGD for Heterogeneous Hierarchical Networks

R-GAP: Recursive Gradient Attack on Privacy

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation

Isometric Transformation Invariant and Equivariant Graph Convolutional Networks

Chaos of Learning Beyond Zero-sum and Coordination via Game Decompositions

Grounding Language to Autonomously-Acquired Skills via Goal Generation

Trajectory Prediction using Equivariant Continuous Convolution

Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

On the role of planning in model-based deep reinforcement learning

A Hypergradient Approach to Robust Regression without Correspondence

Fast convergence of stochastic subgradient method under interpolation

Wasserstein Embedding for Graph Learning

Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization

Shape or Texture: Understanding Discriminative Features in CNNs

Neurally Augmented ALISTA

Learning from Demonstration with Weakly Supervised Disentanglement

Score-Based Generative Modeling through Stochastic Differential Equations

On Data-Augmentation and Consistency-Based Semi-Supervised Learning

Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch

The role of Disentanglement in Generalisation

Shapley Explanation Networks

C-Learning: Horizon-Aware Cumulative Accessibility Estimation

Multi-resolution modeling of a discrete stochastic process identifies causes of cancer

PC2WF: 3D Wireframe Reconstruction from Raw Point Clouds

Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning

DINO: A Conditional Energy-Based GAN for Domain Translation

HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients

AdaSpeech: Adaptive Text to Speech for Custom Voice

Simple Augmentation Goes a Long Way: ADRL for DNN Quantization

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

SkipW: Resource Adaptable RNN with Strict Upper Computational Limit

Pruning Neural Networks at Initialization: Why Are We Missing the Mark?

On the Origin of Implicit Regularization in Stochastic Gradient Descent

Transient Non-stationarity and Generalisation in Deep Reinforcement Learning

Adversarial score matching and improved sampling for image generation

LiftPool: Bidirectional ConvNet Pooling

Scalable Bayesian Inverse Reinforcement Learning

Return-Based Contrastive Representation Learning for Reinforcement Learning

Implicit Gradient Regularization

Variational Intrinsic Control Revisited

Understanding and Improving Lexical Choice in Non-Autoregressive Translation

Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models

Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets

Complex Query Answering with Neural Link Predictors

Learning to Sample with Local and Global Contexts in Experience Replay Buffer

Gradient Origin Networks

Nonseparable Symplectic Neural Networks

Revisiting Locally Supervised Learning: an Alternative to End-to-end Training

Deep Repulsive Clustering of Ordered Data Based on Order-Identity Decomposition

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning

Reweighting Augmented Samples by Minimizing the Maximal Expected Loss

Robust early-learning: Hindering the memorization of noisy labels

Monte-Carlo Planning and Learning with Language Action Value Estimates

Drop-Bottleneck: Learning Discrete Compressed Representation for Noise-Robust Exploration

Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting

EigenGame: PCA as a Nash Equilibrium

DrNAS: Dirichlet Neural Architecture Search

Graph Edit Networks

Capturing Label Characteristics in VAEs

Neural Delay Differential Equations

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

Group Equivariant Stand-Alone Self-Attention For Vision

Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows

Undistillable: Making A Nasty Teacher That CANNOT teach students

Learning Hyperbolic Representations of Topological Features

Lipschitz Recurrent Neural Networks

Explaining the Efficacy of Counterfactually Augmented Data

Behavioral Cloning from Noisy Demonstrations

Layer-adaptive Sparsity for the Magnitude-based Pruning

Prototypical Representation Learning for Relation Extraction

Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues

Deformable DETR: Deformable Transformers for End-to-End Object Detection

When does preconditioning help or hurt generalization?

Group Equivariant Conditional Neural Processes

Learning from Protein Structure with Geometric Vector Perceptrons

PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies

Molecule Optimization by Explainable Evolution

Predicting Inductive Biases of Pre-Trained Models

Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Multi-timescale Representation Learning in LSTM Language Models

Adaptive Procedural Task Generation for Hard-Exploration Problems

DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs

Extreme Memorization via Scale of Initialization

Prototypical Contrastive Learning of Unsupervised Representations

Learning from others' mistakes: Avoiding dataset biases without modeling them

LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition

WaNet - Imperceptible Warping-based Backdoor Attack

Neural representation and generation for RNA secondary structures

Can a Fruit Fly Learn Word Embeddings?

RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs

Evaluations and Methods for Explanation through Robustness Analysis

gradSim: Differentiable simulation for system identification and visuomotor control

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization

Isotropy in the Contextual Embedding Space: Clusters and Manifolds

Reinforcement Learning with Random Delays

Deep Learning meets Projective Clustering

Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling

On Graph Neural Networks versus Graph-Augmented MLPs

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation

Distributional Sliced-Wasserstein and Applications to Generative Modeling

MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond

NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-end Learning and Control

Understanding the effects of data parallelism and sparsity on neural network training

Planning from Pixels using Inverse Dynamics Models

Benchmarks for Deep Off-Policy Evaluation

Meta-Learning of Structured Task Distributions in Humans and Machines

Growing Efficient Deep Networks by Structured Continuous Sparsification

Training independent subnetworks for robust prediction

Better Fine-Tuning by Reducing Representational Collapse

Selective Classification Can Magnify Disparities Across Groups

Zero-shot Synthesis with Group-Supervised Learning

Learning Task-General Representations with Generative Neuro-Symbolic Modeling

BERTology Meets Biology: Interpreting Attention in Protein Language Models

Mathematical Reasoning via Self-supervised Skip-tree Training

AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly

BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization

Economic Hyperparameter Optimization With Blended Search Strategy

Average-case Acceleration for Bilinear Games and Normal Matrices

Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network

IsarStep: a Benchmark for High-level Mathematical Reasoning

Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis

SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments

Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning

On the mapping between Hopfield networks and Restricted Boltzmann Machines

Distance-Based Regularisation of Deep Networks for Fine-Tuning

Ringing ReLUs: Harmonic Distortion Analysis of Nonlinear Feedforward Networks

Generalization in data-driven models of primary visual cortex

Efficient Continual Learning with Modular Networks and Task-Driven Priors

Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time

Activation-level uncertainty in deep neural networks

On Statistical Bias In Active Learning: How and When to Fix It

Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation

Scaling the Convex Barrier with Active Sets

NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation

Symmetry-Aware Actor-Critic for 3D Molecular Design

Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning

Robust Overfitting may be mitigated by properly learned smoothening

Characterizing signal propagation to close the performance gap in unnormalized ResNets

Learning continuous-time PDEs from sparse data with graph neural networks

Latent Skill Planning for Exploration and Transfer

Uncertainty-aware Active Learning for Optimal Bayesian Classifier

Self-supervised Adversarial Robustness for the Low-label, High-data Regime

Single-Photon Image Classification

Unsupervised Object Keypoint Learning using Local Spatial Predictability

CcGAN: Continuous Conditional Generative Adversarial Networks for Image Generation

Differentially Private Learning Needs Better Features (or Much More Data)

Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks

ANOCE: Analysis of Causal Effects with Multiple Mediators via Constrained Structural Learning

Long-tailed Recognition by Routing Diverse Distribution-Aware Experts

Grounded Language Learning Fast and Slow

Transformer protein language models are unsupervised structure learners

Uncertainty Estimation in Autoregressive Structured Prediction

Learning to live with Dale's principle: ANNs with separate excitatory and inhibitory units

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

CT-Net: Channel Tensorization Network for Video Classification

On the Universality of Rotation Equivariant Point Cloud Networks

Universal approximation power of deep residual neural networks via nonlinear control theory

Learning a Latent Search Space for Routing Problems using Variational Autoencoders

A teacher-student framework to distill future trajectories

The Traveling Observer Model: Multi-task Learning Through Spatial Variable Embeddings

What they do when in doubt: a study of inductive biases in seq2seq learners

Group Equivariant Generative Adversarial Networks

Robust Curriculum Learning: from clean label detection to noisy label self-correction

Support-set bottlenecks for video-text representation learning

Graph Information Bottleneck for Subgraph Recognition

Learning Deep Features in Instrumental Variable Regression

Neural Synthesis of Binaural Speech From Mono Audio

Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning

Differentiable Segmentation of Sequences

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

Network Pruning That Matters: A Case Study on Retraining Variants

Degree-Quant: Quantization-Aware Training for Graph Neural Networks

Boost then Convolve: Gradient Boosting Meets Graph Neural Networks

Learning Associative Inference Using Fast Weight Memory

SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization

Towards Robust Neural Networks via Close-loop Control

Differentiable Trust Region Layers for Deep Reinforcement Learning

Discovering Non-monotonic Autoregressive Orderings with Variational Inference

Rethinking Positional Encoding in Language Pre-training

PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics

Improving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov Wasserstein

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Calibration of Neural Networks using Splines

Exploring Balanced Feature Spaces for Representation Learning

Measuring Massive Multitask Language Understanding

Kanerva++: Extending the Kanerva Machine With Differentiable, Locally Block Allocated Latent Memory

Aligning AI With Shared Human Values

Learning Manifold Patch-Based Representations of Man-Made Shapes

Filtered Inner Product Projection for Crosslingual Embedding Alignment

Correcting experience replay for multi-agent communication

How Benign is Benign Overfitting ?

High-Capacity Expert Binary Networks

Structured Prediction as Translation between Augmented Natural Languages

Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online

Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting

Incremental few-shot learning via vector quantization in deep embedded space

In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness

Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks

MALI: A memory efficient and reverse accurate integrator for Neural ODEs

FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning

Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

Neural Learning of One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces

Adaptive Universal Generalized PageRank Graph Neural Network

Latent Convergent Cross Mapping

Semantic Re-tuning with Contrastive Tension

On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers

GANs Can Play Lottery Tickets Too

Efficient Conformal Prediction via Cascaded Inference with Expanded Admission

Disambiguating Symbolic Expressions in Informal Documents

Lossless Compression of Structured Convolutional Models via Lifting

Uncertainty in Gradient Boosting via Ensembles

An Unsupervised Deep Learning Approach for Real-World Image Denoising

Conformation-Guided Molecular Representation with Hamiltonian Neural Networks

Neural ODE Processes

Towards Robustness Against Natural Language Word Substitutions

Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning

Effective Distributed Learning with Random Features: Improved Bounds and Algorithms

On Learning Universal Representations Across Languages

Minimum Width for Universal Approximation

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments

Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control

Self-Supervised Learning of Compressed Video Representations

Initialization and Regularization of Factorized Neural Layers

Predicting Infectiousness for Proactive Contact Tracing

Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?

Trusted Multi-View Classification

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning

Efficient Wasserstein Natural Gradients for Reinforcement Learning

Robust Pruning at Initialization

Parameter Efficient Multimodal Transformers for Video Representation Learning

Active Contrastive Learning of Audio-Visual Video Representations

Enforcing robust control guarantees within neural network policies

Contrastive Divergence Learning is a Time Reversal Adversarial Game

Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding

Domain-Robust Visual Imitation Learning with Mutual Information Constraints

Theoretical bounds on estimation error for meta-learning

Towards Impartial Multi-task Learning

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Counterfactual Generative Networks

IOT: Instance-wise Layer Reordering for Transformer Structures

A statistical theory of cold posteriors in deep neural networks

The inductive bias of ReLU networks on orthogonally separable data

A Unified Approach to Interpreting and Boosting Adversarial Transferability

Contextual Transformation Networks for Online Continual Learning

Private Image Reconstruction from System Side Channels Using Generative Models

GAN "Steerability" without optimization

Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose?

HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents

Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs

AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights

Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures

Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds

Blending MPC & Value Function Approximation for Efficient Reinforcement Learning

Using latent space regression to analyze and leverage compositionality in GANs

Shape-Texture Debiased Neural Network Training

Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study

DC3: A learning method for optimization with hard constraints

On the geometry of generalization and memorization in deep neural networks

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

Usable Information and Evolution of Optimal Representations During Training

Learning Invariant Representations for Reinforcement Learning without Reconstruction

Parrot: Data-Driven Behavioral Priors for Reinforcement Learning

Zero-Cost Proxies for Lightweight NAS

Perceptual Adversarial Robustness: Defense Against Unseen Threat Models

Deep Neural Network Fingerprinting by Conferrable Adversarial Examples

Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation

Noise or Signal: The Role of Image Backgrounds in Object Recognition

Shapley explainability on the data manifold

Improving Transformation Invariance in Contrastive Representation Learning

Learning "What-if" Explanations for Sequential Decision-Making

A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention

Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking

Graph Convolution with Low-rank Learnable Local Filters

Meta-Learning with Neural Tangent Kernels

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization

Colorization Transformer

Human-Level Performance in No-Press Diplomacy via Equilibrium Search

Separation and Concentration in Deep Networks

Training GANs with Stronger Augmentations via Contrastive Discriminator

Locally Free Weight Sharing for Network Width Search

Language-Agnostic Representation Learning of Source Code from Structure and Context

Learning Mesh-Based Simulation with Graph Networks

Set Prediction without Imposing Structure as Conditional Density Estimation

Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search

What are the Statistical Limits of Offline RL with Linear Function Approximation?

Learning Accurate Entropy Model with Global Reference for Image Compression

What Makes Instance Discrimination Good for Transfer Learning?

Improving Adversarial Robustness via Channel-wise Activation Suppressing

A unifying view on implicit bias in training linear neural networks

Representation Learning for Sequence Data with Deep Autoencoding Predictive Components

Policy-Driven Attack: Learning to Query for Hard-label Black-box Adversarial Examples

Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate

Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency

Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration

UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers

A Good Image Generator Is What You Need for High-Resolution Video Synthesis

What Should Not Be Contrastive in Contrastive Learning

A Design Space Study for LISTA and Beyond

Rethinking Soft Labels for Knowledge Distillation: A Bias–Variance Tradeoff Perspective

Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval

Hierarchical Reinforcement Learning by Discovering Intrinsic Options

Denoising Diffusion Implicit Models

Intraclass clustering: an implicit learning ability that regularizes DNNs

Contrastive Learning with Hard Negative Samples

Discrete Graph Structure Learning for Forecasting Multiple Time Series

Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs

Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration

Data-Efficient Reinforcement Learning with Self-Predictive Representations

A Distributional Approach to Controlled Text Generation

A Block Minifloat Representation for Training Deep Neural Networks

On the Impossibility of Global Convergence in Multi-Loss Optimization

Self-supervised Representation Learning with Relative Predictive Coding

Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

Heating up decision boundaries: isocapacitory saturation, adversarial scenarios and generalization bounds

Rethinking Architecture Selection in Differentiable NAS

CaPC Learning: Confidential and Private Collaborative Learning

Incorporating Symmetry into Deep Dynamics Models for Improved Generalization

Dataset Condensation with Gradient Matching

PMI-Masking: Principled masking of correlated spans

Sharpness-aware Minimization for Efficiently Improving Generalization

Learning with AMIGo: Adversarially Motivated Intrinsic Goals

Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning

End-to-end Adversarial Text-to-Speech

SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness

not-MIWAE: Deep Generative Modelling with Missing not at Random Data

Distilling Knowledge from Reader to Retriever for Question Answering

Adaptive Extra-Gradient Methods for Min-Max Optimization and Games

Training with Quantization Noise for Extreme Model Compression

The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods

IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning

ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations

Learning Long-term Visual Dynamics with Region Proposal Interaction Networks

Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity

Conditional Generative Modeling via Learning the Latent Space

When Optimizing $f$-Divergence is Robust with Label Noise

Contrastive Learning with Adversarial Perturbations for Conditional Text Generation

Self-Supervised Policy Adaptation during Deployment

Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Geometry-aware Instance-reweighted Adversarial Training

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Learning with Instance-Dependent Label Noise: A Sample Sieve Approach

Bag of Tricks for Adversarial Training

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation

The Risks of Invariant Risk Minimization

DOP: Off-Policy Multi-Agent Decomposed Policy Gradients

Generative Time-series Modeling with Fourier Flows

Individually Fair Gradient Boosting

Federated Learning Based on Dynamic Regularization

Contemplating Real-World Object Classification

When Do Curricula Work?

Learning Neural Event Functions for Ordinary Differential Equations

Mastering Atari with Discrete World Models

Getting a CLUE: A Method for Explaining Uncertainty Estimates

Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

DeLighT: Deep and Light-weight Transformer

Domain Generalization with MixStyle

Concept Learners for Few-Shot Learning

Creative Sketch Generation

Rethinking Embedding Coupling in Pre-trained Language Models

How Does Mixup Help With Robustness and Generalization?

Lifelong Learning of Compositional Structures

Debiasing Concept-based Explanations with Causal Analysis

Learning to Represent Action Values as a Hypergraph on the Action Vertices

Collective Robustness Certificates: Exploiting Interdependence in Graph Neural Networks

Rethinking Attention with Performers

Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods

Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator

Mutual Information State Intrinsic Control

Learning explanations that are hard to vary

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

Physics-aware, probabilistic model order reduction with guaranteed stability

RODE: Learning Roles to Decompose Multi-Agent Tasks

Neural gradients are near-lognormal: improved quantized and sparse training

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

Property Controllable Variational Autoencoder via Invertible Mutual Dependence

Neural Thompson Sampling

Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Effective and Efficient Vote Attack on Capsule Networks

Information Laundering for Model Privacy

Isometric Propagation Network for Generalized Zero-shot Learning

Learning with Feature-Dependent Label Noise: A Progressive Approach

SEED: Self-supervised Distillation For Visual Representation

Unsupervised Audiovisual Synthesis via Exemplar Autoencoders

Learning Energy-Based Generative Models via Coarse-to-Fine Expanding and Sampling

DDPNOpt: Differential Dynamic Programming Neural Optimizer

Mirostat: A Neural Text Decoding Algorithm That Directly Controls Perplexity

Contextual Dropout: An Efficient Sample-Dependent Dropout Module

Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry

Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with $1/n$ Parameters

Large Associative Memory Problem in Neurobiology and Machine Learning

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

On the Dynamics of Training Attention Models

INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving

A Critique of Self-Expressive Deep Subspace Clustering

Learning to Recombine and Resample Data For Compositional Generalization

Learning Generalizable Visual Representations via Interactive Gameplay

Overparameterisation and worst-case generalisation: friend or foe?

Calibration tests beyond classification

On the Transfer of Disentangled Representations in Realistic Settings

Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS

Revisiting Few-sample BERT Fine-tuning

Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning

SSD: A Unified Framework for Self-Supervised Outlier Detection

Long-tail learning via logit adjustment

Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval

Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments

Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms

Understanding the role of importance weighting for deep learning

LEAF: A Learnable Frontend for Audio Classification

Monotonic Kronecker-Factored Lattice

Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data

Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics

Wasserstein-2 Generative Networks

Emergent Road Rules In Multi-Agent Driving Environments

Iterative Empirical Game Solving via Single Policy Best Response

Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes

Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule

Understanding the failure modes of out-of-distribution generalization

Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs

Hopfield Networks is All You Need

The Importance of Pessimism in Fixed-Dataset Policy Optimization

Representation Balancing Offline Model-based Reinforcement Learning

FairBatch: Batch Selection for Model Fairness

Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks

Systematic generalisation with group invariant predictions

Efficient Inference of Flexible Interaction in Spiking-neuron Networks

Graph Coarsening with Neural Networks

Optimal Conversion of Conventional Artificial Neural Networks to Spiking Neural Networks

Are wider nets better given the same number of parameters?

Autoregressive Entity Retrieval

DARTS-: Robustly Stepping out of Performance Collapse Without Indicators

Adversarially Guided Actor-Critic

Balancing Constraints and Rewards with Meta-Gradient D4PG

Auxiliary Learning by Implicit Differentiation

Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent

Large-width functional asymptotics for deep Gaussian neural networks

Deciphering and Optimizing Multi-Task Learning: a Random Matrix Approach

Free Lunch for Few-shot Learning: Distribution Calibration

Generalized Multimodal ELBO

Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits

Convex Regularization behind Neural Reconstruction

Efficient Certified Defenses Against Patch Attacks on Image Classifiers

Learning Neural Generative Dynamics for Molecular Conformation Generation

Individually Fair Rankings

Hierarchical Autoregressive Modeling for Neural Video Compression

Robust Reinforcement Learning on State Observations with Learned Optimal Adversary

How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima

Evaluation of Similarity-based Explanations

Geometry-Aware Gradient Algorithms for Neural Architecture Search

Open Question Answering over Tables and Text

The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods

VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models

Self-supervised Learning from a Multi-view Perspective

Fair Mixup: Fairness via Interpolation

Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable Models

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

Removing Undesirable Feature Contributions Using Out-of-Distribution Data

Meta-learning Symmetries by Reparameterization

How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision

For self-supervised learning, Rationality implies generalization, provably

A Temporal Kernel Approach for Deep Learning with Continuous-time Information

Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors

Conservative Safety Critics for Exploration

Model-Based Visual Planning with Self-Supervised Functional Distances

GraphCodeBERT: Pre-training Code Representations with Data Flow

No MCMC for me: Amortized sampling for fast and stable training of energy-based models

BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction

Predicting Classification Accuracy When Adding New Unobserved Classes

Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors

Learning the Pareto Front with Hypernetworks

Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime

Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows

MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space

Impact of Representation Learning in Linear Bandits

EEC: Learning to Encode and Regenerate Images for Continual Learning

What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions

Improving VAEs' Robustness to Adversarial Attack

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers

Control-Aware Representations for Model-based Reinforcement Learning

Scaling Symbolic Methods using Gradients for Neural Model Explanation

Empirical or Invariant Risk Minimization? A Sample Complexity Perspective

CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

The geometry of integration in text classification RNNs

On the Bottleneck of Graph Neural Networks and its Practical Implications

Learning to Reach Goals via Iterated Supervised Learning

On the Critical Role of Conventions in Adaptive Human-AI Collaboration

CopulaGNN: Towards Integrating Representational and Correlational Roles of Graphs in Graph Neural Networks

Discovering a set of policies for the worst case reward

Learning perturbation sets for robust machine learning

Primal Wasserstein Imitation Learning

A Universal Representation Transformer Layer for Few-Shot Image Classification

MoPro: Webly Supervised Learning with Momentum Prototypes

Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU

Optimism in Reinforcement Learning with Generalized Linear Function Approximation

Deberta: Decoding-Enhanced Bert With Disentangled Attention

Variational Information Bottleneck for Effective Low-Resource Fine-Tuning

Expressive Power of Invariant and Equivariant Graph Neural Networks

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

Computational Separation Between Convolutional and Fully-Connected Networks

Probabilistic Numeric Convolutional Neural Networks

FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders

Coping with Label Shift via Distributionally Robust Optimisation

MixKD: Towards Efficient Distillation of Large-scale Language Models

Learning a Latent Simplex in Input Sparsity Time

Teaching with Commentaries

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Fantastic Four: Differentiable and Efficient Bounds on Singular Values of Convolution Layers

Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data

Negative Data Augmentation

Scalable Transfer Learning with Expert Models

A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels

Learning A Minimax Optimizer: A Pilot Study

Meta Back-Translation

Optimal Regularization can Mitigate Double Descent

Net-DNF: Effective Deep Modeling of Tabular Data

MultiModalQA: complex question answering over text, tables and images

Dynamic Tensor Rematerialization

AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models

Few-Shot Learning via Learning the Representation, Provably

Wandering within a world: Online contextualized few-shot learning

WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic

Nearest Neighbor Machine Translation

Knowledge distillation via softmax regression representation learning

Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition

Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design

Neural Pruning via Growing Regularization

Mixed-Features Vectors and Subspace Splitting

Graph-Based Continual Learning

Sparse Quantized Spectral Clustering

Taking Notes on the Fly Helps Language Pre-Training

Explainable Deep One-Class Classification

Revisiting Dynamic Convolution via Matrix Decomposition

BiPointNet: Binary Neural Network for Point Clouds

Prediction and generalisation over directed actions by grid cells

Continual learning in recurrent neural networks

Neural networks with late-phase weights

Sparse encoding for more-interpretable feature-selecting representations in probabilistic matrix factorization

HyperGrid Transformers: Towards A Single Model for Multiple Tasks

Learning Robust State Abstractions for Hidden-Parameter Block MDPs

Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

Is Attention Better Than Matrix Decomposition?

Learning Incompressible Fluid Dynamics from Scratch - Towards Fast, Differentiable Fluid Models that Generalize

Refining Deep Generative Models via Discriminator Gradient Flow

Entropic gradient descent algorithms and wide flat minima

New Bounds For Distributed Mean Estimation and Variance Reduction

Learning Value Functions in Deep Policy Gradients using Residual Variance

Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic

No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

Parameter-Based Value Functions

CoCon: A Self-Supervised Approach for Controlled Text Generation

MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning

Beyond Categorical Label Representations for Image Classification

Meta-learning with negative learning rates

Provable Rich Observation Reinforcement Learning with Combinatorial Latent States

Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime

Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoF

Contrastive Syn-to-Real Generalization

Evolving Reinforcement Learning Algorithms

Neural Topic Model via Optimal Transport

Class Normalization for (Continual)? Generalized Zero-Shot Learning

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction

A Gradient Flow Framework For Analyzing Network Pruning

A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks

Bayesian Few-Shot Classification with One-vs-Each Pólya-Gamma Augmented Gaussian Processes

Knowledge Distillation as Semiparametric Inference

NBDT: Neural-Backed Decision Tree

Deep Equals Shallow for ReLU Networks in Kernel Regimes

Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling

MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering

Mind the Pad -- CNNs Can Develop Blind Spots

A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks

Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs

Interpretable Models for Granger Causality Using Self-explaining Neural Networks

Estimating Lipschitz constants of monotone deep equilibrium models

Probing BERT in Hyperbolic Spaces

Batch Reinforcement Learning Through Continuation Method

Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning

Diverse Video Generation using a Gaussian Process Trigger

Learning and Evaluating Representations for Deep One-Class Classification

Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers

Retrieval-Augmented Generation for Code Summarization via Hybrid GNN

Randomized Automatic Differentiation

Generalized Energy Based Models

Temporally-Extended ε-Greedy Exploration

Multiplicative Filter Networks

Clustering-friendly Representation Learning via Instance Discrimination and Feature Decorrelation

FedMix: Approximation of Mixup under Mean Augmented Federated Learning

Self-training For Few-shot Transfer Across Extreme Task Differences

VCNet and Functional Targeted Regularization For Learning Causal Effects of Continuous Treatments

Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks

Generalization bounds via distillation

Model Patching: Closing the Subgroup Performance Gap with Data Augmentation

A Learning Theoretic Perspective on Local Explainability

Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions

Learning to Set Waypoints for Audio-Visual Navigation

Partitioned Learned Bloom Filters

Unsupervised Meta-Learning through Latent-Space Interpolation in Generative Models

Generalized Variational Continual Learning

Robust and Generalizable Visual Representation Learning via Random Convolutions

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks

X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback

Attentional Constellation Nets for Few-Shot Learning

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech

Risk-Averse Offline Reinforcement Learning

Spatio-Temporal Graph Scattering Transform

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks

Async-RED: A Provably Convergent Asynchronous Block Parallel Stochastic Method using Deep Denoising Priors

Communication in Multi-Agent Reinforcement Learning: Intention Sharing

Few-Shot Bayesian Optimization with Deep Kernel Surrogates

Disentangled Recurrent Wasserstein Autoencoder

In Search of Lost Domain Generalization

Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization

Implicit Normalizing Flows

VTNet: Visual Transformer Network for Object Goal Navigation

Learning Task Decomposition with Ordered Memory Policy Network

Deep Networks and the Multiple Manifold Problem

Learning What To Do by Simulating the Past

Progressive Skeletonization: Trimming more fat from a network at initialization

$i$-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning

Topology-Aware Segmentation Using Discrete Morse Theory

On Dyadic Fairness: Exploring and Mitigating Bias in Graph Connections

Tent: Fully Test-Time Adaptation by Entropy Minimization

Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding

Dataset Inference: Ownership Resolution in Machine Learning

Regularized Inverse Reinforcement Learning

Fast And Slow Learning Of Recurrent Independent Mechanisms

Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates

Semi-supervised Keypoint Localization

Representing Partial Programs with Blended Abstract Semantics

FastSpeech 2: Fast and High-Quality End-to-End Text to Speech

Unlearnable Examples: Making Personal Data Unexploitable

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression

Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization

Identifying Physical Law of Hamiltonian Systems via Meta-Learning

Text Generation by Learning from Demonstrations

Unbiased Teacher for Semi-Supervised Object Detection

Estimating informativeness of samples with Smooth Unique Information

Efficient Generalized Spherical CNNs

DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues

Global Convergence of Three-layer Neural Networks in the Mean Field Regime

Multi-Time Attention Networks for Irregularly Sampled Time Series

Linear Convergent Decentralized Optimization with Compression

Adaptive Federated Optimization

Auction Learning as a Two-Player Game

Analyzing the Expressive Power of Graph Neural Networks in a Spectral Perspective

Exemplary Natural Images Explain CNN Activations Better than State-of-the-Art Feature Visualization

Interpreting Knowledge Graph Relation Representation from Word Embeddings

Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction

DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation

Learning-based Support Estimation in Sublinear Time

Integrating Categorical Semantics into Unsupervised Domain Translation

The Recurrent Neural Tangent Kernel

C-Learning: Learning to Achieve Goals via Recursive Classification

Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study

Influence Estimation for Generative Adversarial Networks

Clairvoyance: A Pipeline Toolkit for Medical Time Series

Randomized Ensembled Double Q-Learning: Learning Fast Without a Model

Learning advanced mathematical computations from examples

Noise against noise: stochastic label noise helps combat inherent label noise

Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning

Protecting DNNs from Theft using an Ensemble of Diverse Models

You Only Need Adversarial Supervision for Semantic Image Synthesis

Neural Spatio-Temporal Point Processes

Linear Mode Connectivity in Multitask and Continual Learning

Fully Unsupervised Diversity Denoising with Convolutional Variational Autoencoders

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral

Influence Functions in Deep Learning Are Fragile

Categorical Normalizing Flows via Continuous Transformations

Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation

Directed Acyclic Graph Neural Networks

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models

SOLAR: Sparse Orthogonal Learned and Random Embeddings

Bayesian Context Aggregation for Neural Processes

Cut out the annotator, keep the cutout: better segmentation with weak supervision

Effective Abstract Reasoning with Dual-Contrast Network

Personalized Federated Learning with First Order Model Optimization

Task-Agnostic Morphology Evolution

Learning Parametrised Graph Shift Operators

Online Adversarial Purification based on Self-supervised Learning