Skip to yearly menu bar Skip to main content


Poster

A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation

Zhengbo Wang · Jian Liang · Lijun Sheng · Ran He · Zilei Wang · Tieniu Tan

Halle B

Abstract:

Contrastive Language-Image Pretraining (CLIP) has gained popularity for its remarkable zero-shot capacity. Recent research has focused on developing efficientfine-tuning methods, such as prompt learning and adapter, to enhance CLIP’sperformance in downstream tasks. However, these methods still require additionaltraining time and computational resources, which is undesirable for devices withlimited resources. In this paper, we revisit a classical algorithm, Gaussian Discriminant Analysis (GDA), and apply it to the downstream classification of CLIP.Typically, GDA assumes that features of each class follow Gaussian distributionswith identical covariance. By leveraging Bayes’ formula, the classifier can beexpressed in terms of the class means and covariance, which can be estimated fromthe data without the need for training. To integrate knowledge from both visual andtextual modalities, we ensemble it with the original zero-shot classifier within CLIP.Extensive results on 17 datasets validate that our method surpasses or achievescomparable results with state-of-the-art methods on few-shot classification, imbalanced learning, and out-of-distribution generalization. In addition, we extendour method to base-to-new generalization and unsupervised learning, once againdemonstrating its superiority over competing approaches.

Chat is not available.