Poster
in
Workshop: 3rd Workshop on practical ML for Developing Countries: learning under limited/low resource scenarios
KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation
Weijie Xu · Xiaoyu Jiang · Jay Desai · Bin Han · Fuqin Yan · Francis Iannacci
In text classification tasks, fine tuning pretrained language models like BERT andGPT-3 yields competitive accuracy; however, both methods require pretraining onlarge text datasets. In contrast, general topic modeling methods possess the ad-vantage of analyzing documents to extract meaningful patterns of words withoutthe need of pretraining. To leverage topic modeling’s unsupervised insights ex-traction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embed-dings, few labeled documents and is efficient to train, making it ideal under re-source constrained settings. Across a variety of datasets, our method outperformsexisting supervised topic modeling methods in classification accuracy, robustnessand efficiency and achieves similar performance compare to state of the art weaklysupervised text classification methods.