Network Dissection: Quantifying Interpretability of Deep Visual Representations

David Bau*, Bolei Zhou*, Aditya Khosla, Aude Oliva, Antonio Torralba
Massachusetts Institute of Technology

house
dog
train
plant
airplane
res5c unit 1410
IoU=0.142
res5c unit 1573
IoU=0.216
res5c unit 924
IoU=0.293
res5c unit 264
IoU=0.126
res5c unit 1243
IoU=0.172
res5c unit 301
IoU=0.087
res5c unit 1718
IoU=0.193
res5c unit 2001
IoU=0.255
res5c unit 766
IoU=0.092
res5c unit 1379
IoU=0.156
inception_4e unit 789
IoU=0.137
inception_4e unit 750
IoU=0.203
inception_5b unit 626
IoU=0.145
inception_4e unit 56
IoU=0.139
inception_4e unit 92
IoU=0.164
inception_4e unit 175
IoU=0.115
inception_5b unit 437
IoU=0.108
inception_5b unit 415
IoU=0.143
inception_4e unit 714
IoU=0.105
inception_4e unit 759
IoU=0.144
conv5_3 unit 243
IoU=0.070
conv5_3 unit 142
IoU=0.205
conv5_3 unit 463
IoU=0.126
conv5_3 unit 85
IoU=0.086
conv5_3 unit 151
IoU=0.150
conv5_3 unit 102
IoU=0.070
conv5_3 unit 491
IoU=0.112
conv5_3 unit 402
IoU=0.058
conv4_3 unit 336
IoU=0.068
conv5_3 unit 204
IoU=0.077

Selected units are shown from three state-of-the-art network architectures when trained to classify images of places (places-365). Many individual units respond to specific high-level concepts (object segmentations) that are not directly represented in the training set (scene classifications).

Why we study interpretable units

Interpretable units are interesting because they hint that deep networks may not be completely opaque black boxes.

However, the observations of interpretability up to now are just a hint: there is not yet a complete understanding of whether or how interpretable units are evidence of a so-called distentangled representation.

AlexNet-Places205 conv5 unit 138: heads
AlexNet-Places205 conv5 unit 215: castles
AlexNet-Places205 conv5 unit 13: lamps
AlexNet-Places205 conv5 unit 53: stairways

What is Network Dissection?

Our paper investigates three questions:

  1. What is a disentangled representation, and how can its factors be quantified and detected?
  2. Do interpretable hidden units reflect a special alignment of feature space, or are interpretations a chimera?
  3. What conditions in state-of-the-art training lead to representations with greater or lesser entanglement?

Network Dissection is our method for quantifying interpretability of individual units in a deep CNN (i.e., our answer to question #1). It works by measuring the alignment between unit response and a set of concepts drawn from a broad and dense segmentation data set called Broden.

Are interpretations a chimera?

Network dissection shows that interpretable concepts are unusual orientations of representation space. Their emergence is evidence that the network is decomposing intermediate concepts, answering question #2.

Interpretability drops as the basis is gradually changed towards a random basis. Contradicting the prevailing wisdom, interpretability is not isotropic in representation space, and networks do appear to learn axis-aligned decompositions.

What affects interpretability?

This brings us to question #3: what conditions lead to higher or lower levels of interpetability?

Interpretability of ResNet > VGG > GoogLeNet > AlexNet, and in terms of primary training tasks, we find Places365 > Places205 > ImageNet.
Interpretability varies widely under a range of self-supervised tasks, and none approaches interpretability from supervision by ImageNet or Places.

The code you find here will let you reproduce our interpretability benchmarks, and will allow you measure and find ways to improve interpretability in your own deep CNNs.

Network Dissection Results

Previous Work

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. "Object Detectors Emerge in Deep Scene CNNs." International Conference on Learning Representations (ICLR), 2015. [PDF][Code]
Comment: In this work we analyzed the interpretable object detectors emerged inside the CNN trained for classifying scenes.

B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. "Learning Deep Features for Discriminative Localization." Computer Vision and Pattern Recognition (CVPR), 2016. [PDF][Webpage][Code]
Comment: In this work we leveraged the internal representation of classification CNNs for weakly-supervised localization.

Reference

D. Bau*, B. Zhou*, A. Khosla, A. Oliva, and A. Torralba. "Network Dissection: Quantifying Interpretability of Deep Visual Representations." Computer Vision and Pattern Recognition (CVPR), 2017. Oral. [PDF][Code]

(*first two authors contributed equally.)

@inproceedings{netdissect2017,
  title={Network Dissection: Quantifying Interpretability of Deep Visual Representations},
  author={Bau, David and Zhou, Bolei and Khosla, Aditya and Oliva, Aude and Torralba, Antonio},
  booktitle={Computer Vision and Pattern Recognition},
  year={2017}
}

Acknowledgement: This work was partly supported by the National Science Foundation under Grant No. 1524817 to A.T.; the Vannevar Bush Faculty Fellowship program sponsored by the Basic Research Office of the Assistant Secretary of Defense for Research and Engineering and funded by the Office of Naval Research through grant N00014-16-1-3116 to A.O.; the MIT Big Data Initiative at CSAIL, the Toyota Research Institute MIT CSAIL Joint Research Center, Google and Amazon Awards, and a hardware donation from NVIDIA Corporation. B.Z. is supported by a Facebook Fellowship.