Poster
in
Workshop: Pitfalls of limited data and computation for Trustworthy ML
DORA: Exploring outlier representations in Deep Neural Networks
Kirill Bykov · Mayukh Deb · Dennis Grinwald · Klaus R Muller · Marina Höhne
Deep Neural Networks (DNNs) draw their power from the representations they learn. However, while being incredibly effective in learning complex abstractions, they are susceptible to learn malicious artifacts, due to the spurious correlations inherent in the training data. In this paper, we introduce DORA (Data-agnOstic Representation Analysis): the first data-agnostic framework for the analysis of the representation space of DNNs. We propose a novel distance measure between representations that utilizes self-explaining capabilities within the network itself and quantitatively validate its alignment with human-defined semantic distance. We further demonstrate that this metric could be utilized for the detection of anomalous representations, which may bear a risk of learning unintended spurious concepts deviating from the desired decision-making policy. Finally, we demonstrate the practical utility of DORA by analyzing and identifying artifactual representations in widely popular Computer Vision networks.