ICLR Poster AttEXplore: Attribution for Explanation with model parameters eXploration

Poster

AttEXplore: Attribution for Explanation with model parameters eXploration

Zhiyu Zhu · Huaming Chen · Jiayu Zhang · Xinyi Wang · Zhibo Jin · Jason Xue · Flora Salim

Halle B

[ Abstract ]

[ OpenReview]

Abstract:

Deep Neural Networks (DNNs) have achieved state-of-the-art performance in various application scenarios. However, due to the real-world noise and human-added perturbations, the trustworthiness of DNNs has been a critical concern from the security perspective. Therefore, it is imperative to provide explainability for the decisions made by the non-linear and complex parameterized models. Given the diverse decision boundaries across various models and specific tasks, attribution methods are promising for this goal, yet its performance can be further improved. In this paper, for the first time, we present that the decision boundary exploration approaches of attribution are consistent with the process for transferable adversarial attacks. Utilizing this consistency, we introduce a novel attribution method via model parameter exploration. Furthermore, inspired by the capability of frequency exploration to investigate the model parameters, we provide enhanced explainability for DNN models by manipulating the input features based on frequency information to explore the decision boundaries of different models. The large-scale experiments demonstrate that our \textbf{A}ttribution method for \textbf{E}xplanation with model parameter e\textbf{X}ploration (AttEXplore) outperforms other state-of-the-art interpretability methods. Moreover, by employing other transferable attack techniques, AttEXplore can explore potential variations in attribution outcomes. Our code is available at: https://anonymous.4open.science/r/AMPE-6C32/.

Chat is not available.