Poster
in
Workshop: PAIR^2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data
Post-hoc Concept Bottleneck Models
Mert Yuksekgonul · Maggie Wang · James Y Zou
Concept Bottleneck Models (CBMs) map the inputs onto a concept bottleneck and use the bottleneck to make a prediction. A concept bottleneck enhances interpretability since it can be investigated to understand what the model sees in an input, and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require concept labels during training to learn the bottleneck. Additionally, it is questionable if CBMs can match the accuracy of an unrestricted neural network trained on a given domain, potentially reducing the incentive to deploy them in practice. In this work, we address these two key limitations by introducing Post-hoc Concept Bottleneck models (P-CBMs). We show that we can turn any neural network into a P-CBM, without sacrificing model performance and retaining interpretability benefits. Finally, we show that P-CBMs can provide significant performance gains with model editing without any fine-tuning and needing data from the target domain.