Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 3rd Workshop on practical ML for Developing Countries: learning under limited/low resource scenarios

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation

Weijie Xu · Xiaoyu Jiang · Jay Desai · Bin Han · Fuqin Yan · Francis Iannacci


Abstract:

In text classification tasks, fine tuning pretrained language models like BERT andGPT-3 yields competitive accuracy; however, both methods require pretraining onlarge text datasets. In contrast, general topic modeling methods possess the ad-vantage of analyzing documents to extract meaningful patterns of words withoutthe need of pretraining. To leverage topic modeling’s unsupervised insights ex-traction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embed-dings, few labeled documents and is efficient to train, making it ideal under re-source constrained settings. Across a variety of datasets, our method outperformsexisting supervised topic modeling methods in classification accuracy, robustnessand efficiency and achieves similar performance compare to state of the art weaklysupervised text classification methods.

Chat is not available.