ICLR Poster Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off

Poster

Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off

Rahul Rade · Seyed-Mohsen Moosavi-Dezfooli

Keywords: [ robustness ] [ adversarial training ]

[ Abstract ]

[ Visit Poster at Spot G1 in Virtual World ] [ OpenReview]

Abstract:

While adversarial training has become the de facto approach for training robust classifiers, it leads to a drop in accuracy. This has led to prior works postulating that accuracy is inherently at odds with robustness. Yet, the phenomenon remains inexplicable. In this paper, we closely examine the changes induced in the decision boundary of a deep network during adversarial training. We find that adversarial training leads to unwarranted increase in the margin along certain adversarial directions, thereby hurting accuracy. Motivated by this observation, we present a novel algorithm, called Helper-based Adversarial Training (HAT), to reduce this effect by incorporating additional wrongly labelled examples during training. Our proposed method provides a notable improvement in accuracy without compromising robustness. It achieves a better trade-off between accuracy and robustness in comparison to existing defenses. Code is available at https://github.com/imrahulr/hat.

Chat is not available.