Virtual presentation / poster accept
Teacher Guided Training: An Efficient Framework for Knowledge Transfer
Manzil Zaheer · Ankit Singh Rawat · Seungyeon Kim · Chong You · Himanshu Jain · Andreas Veit · Rob Fergus · Sanjiv Kumar
Keywords: [ semisupervised learning ] [ Efficient machine learning ] [ knowledge distillation ] [ generalization bounds ] [ distillation ] [ Deep Learning and representational learning ]
The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a high-quality compact model that leverages the knowledge acquired by pretrained generative models, while obviating the need to go through a large volume of data. TGT exploits the fact that the teacher has acquired a good representation of the underlying data domain, which typically corresponds to a much lower dimensional manifold than the input space. Furthermore, we can use the teacher to explore input space more efficiently through sampling or gradient-based methods; thus, making TGT especially attractive for limited data or long-tail settings. We formally capture this benefit of proposed data-domain exploration in our generalization bounds. We find that TGT can improve accuracy on several image classification benchmarks as well as a range of text classification and retrieval tasks.