Skip to yearly menu bar Skip to main content


Spotlight Poster

On the Role of Discrete Tokenization in Visual Representation Learning

Tianqi Du · Yifei Wang · Yisen Wang

Halle B
[ ]
Fri 10 May 7:30 a.m. PDT — 9:30 a.m. PDT
 
Spotlight presentation:

Abstract:

In the realm of self-supervised learning (SSL), masked image modeling (MIM) has gained popularity alongside contrastive learning methods. MIM involves reconstructing masked regions of input images using unmasked portions. A notable subset of MIM methodologies employs discrete visual tokens as reconstruction target. This study explores the role of discrete visual tokens in MIM, with the aim of decoding their potential benefits and inherent constraints. Building upon the connection between MIM and contrastive learning, we provide comprehensive explanations on how discrete tokenization affects generalization performance of MIM. Furthermore, we introduce a novel metric designed to quantify the proficiency of discrete visual tokens in the MIM framework. Inspired by this metric, we contribute an accessible tokenizer design and demonstrate its superior performance across various benchmark datasets and ViT backbones.

Chat is not available.