Spotlight
Information Retention via Learning Supplemental Features
Zhipeng Xie · Yahe Li
The information bottleneck principle provides an information-theoretic method for learning a good representation as a tradeoff between conciseness and predictive ability, which can reduce the information redundancy, eliminate irrelevant and superfluous features, and thus enhance the in-domain generalizability. However, in low-resource or out-of-domain scenarios where the assumption of iid does not necessarily hold true, superfluous (or redundant) relevant features may be supplemental to the mainline features of the model, and be beneficial in making prediction for test dataset with distribution shifts. To address this problem, we propose to keep as much relevant information as possible in use for making predictions. A three-stage supervised learning framework is designed and implemented to jointly learn the mainline and supplemental features, relieving supplemental features from the suppression of mainline features. Experiments on image and text classification tasks have shown our method substantially outperforms several baseline and state-of-the-art methods, especially in low resource cases.