Virtual presentation / poster accept
Extracting Robust Models with Uncertain Examples
Guanlin Li · Guowen Xu · Shangwei Guo · Han Qiu · Jiwei Li · Tianwei Zhang
Keywords: [ Deep Learning and representational learning ]
Model extraction attacks are proven to be a severe privacy threat to Machine Learning as a Service (MLaaS). A variety of techniques have been designed to steal a remote machine learning model with high accuracy and fidelity. However, how to extract a robust model with similar resilience against adversarial attacks is never investigated. This paper presents the first study toward this goal. We first analyze those existing extraction solutions either fail to maintain the model accuracy or model robustness or lead to the robust overfitting issue. Then we propose Boundary Entropy Searching Thief (BEST), a novel model extraction attack to achieve both accuracy and robustness extraction under restricted attack budgets. BEST generates a new kind of uncertain examples for querying and reconstructing the victim model. These samples have uniform confidence scores across different classes, which can perfectly balance the trade-off between model accuracy and robustness. Extensive experiments demonstrate that BEST outperforms existing attack methods over different datasets and model architectures under limited data. It can also effectively invalidate state-of-the-art extraction defenses.