Skip to yearly menu bar Skip to main content


Poster

On Harmonizing Implicit Subpopulations

Feng Hong · Jiangchao Yao · YUEMING LYU · Zhihan Zhou · Ivor Tsang · Ya Zhang · Yanfeng Wang

Halle B
[ ]
Tue 7 May 7:30 a.m. PDT — 9:30 a.m. PDT

Abstract:

Machine learning algorithms under skew distributions usually suffer from poor generalization, especially when the performance parity acts as an important criterion. This is more challenging on the class-balanced data that has some hidden imbalanced subpopulations, since prevalent techniques mainly conduct the class-level calibration and cannot perform the subpopulation-level adjustment without the explicit quantity. Regarding the implicit subpopulation imbalance, we reveal that the key to alleviating the detrimental effect lies in an effective subpopulation discovery with proper rebalancing. We then propose a novel subpopulation-imbalanced learning method, termed as Scatter and HarmonizE (SHE). Our method is built upon the guiding principle of optimal data partition, which involves assigning data to subpopulations in a manner that maximizes the predictive information from inputs to labels. With theoretical guarantees and empirical evidences, SHE succeeds in identifying the hidden subpopulations and encourages subpopulation-balanced predictions. Extensive experiments on various benchmark datasets show the effectiveness of SHE compared with a broad range of baselines.

Chat is not available.