Skip to yearly menu bar Skip to main content


Poster

Masked Distillation Advances Self-Supervised Transformer Architecture Search

Caixia Yan · Xiaojun Chang · Zhihui Li · Lina Yao · Minnan Luo · Qinghua Zheng

Halle B
[ ]
Tue 7 May 1:45 a.m. PDT — 3:45 a.m. PDT

Abstract:

Transformer architecture search (TAS) has achieved remarkable progress in automating the neural architecture design process of vision transformers. Recent TAS advancements have discovered outstanding transformer architectures while saving tremendous labor from human experts. However, it is still cumbersome to deploy these methods in real-world applications due to the expensive costs of data labeling under the supervised learning paradigm. To this end, this paper proposes a masked image modelling (MIM) based self-supervised neural architecture search method specifically designed for vision transformers, termed as MaskTAS, which completely avoids the expensive costs of data labeling inherited from supervised learning. Based on the one-shot NAS framework, MaskTAS requires to train various weight-sharing subnets, which can easily diverged without strong supervision in MIM-based self-supervised learning.For this issue, we design the search space of MaskTAS as a siamesed teacher-student architecture to distill knowledge from pre-trained networks, allowing for efficient training of the transformer supernet.To achieve self-supervised transformer architecture search, we further design a novel unsupervised evaluation metric for the evolutionary search algorithm, where each candidate of the student branch is rated by measuring its consistency with the larger teacher network.Extensive experiments demonstrate that the searched architectures can achieve state-of-the-art accuracy on benchmark dataset even without using manual labels. Moreover, the proposed MaskTAS can generalize well to various data domains and tasks by searching specialized transformer architectures in self-supervised manner.

Chat is not available.