Skip to yearly menu bar Skip to main content


Virtual presentation / poster accept

Progressively Compressed Auto-Encoder for Self-supervised Representation Learning

Li Jin · Yaoming wang · XIAOPENG ZHANG · Yabo Chen · Dongsheng Jiang · Wenrui Dai · Chenglin Li · Hongkai Xiong · Qi Tian

Keywords: [ MIM ] [ self-supervised learning ] [ transformer ] [ Unsupervised and Self-supervised learning ]


Abstract:

As a typical self-supervised learning strategy, Masked Image Modeling (MIM) is driven by recovering all masked patches from visible ones. However, patches from the same image are highly correlated and it is redundant to reconstruct all the masked patches. We find that this redundancy is neglected by existing MIM based methods and causes non-negligible overheads in computation that do not necessarily benefit self-supervised representation. In this paper, we present a novel approach named PCAE, short for Progressively Compressed AutoEncoder, to address the redundant reconstruction issue by progressively compacting tokens and only retaining necessary information for forward propagation and reconstruction. In particular, we identify those redundant tokens in an image via a simple yet effective similarity metric between each token with the mean of the token sequence. Those redundant tokens that other ones can probably represent are progressively dropped accordingly during the forward propagation, and importantly, we only focus on reconstructing these retained tokens. As a result, we are able to achieve a better trade-off between performance and efficiency for pre-training. Besides, benefitting from the flexible strategy, PCAE can be also directly employed for downstream fine-tuning tasks and enable scalable deployment. Experiments show that PCAE achieves comparable performance to MAE with only 1/8 GPU days. The code is available at https://github.com/caddyless/PCAE/.

Chat is not available.