Visualizing Classification Structure of Large-Scale Classifiers

We propose a measure to compute class similarity in large-scale classification based on prediction scores. Such measure has not been formally pro-posed in the literature. We show how visualizing the class similarity matrix can reveal hierarchical structures and relationships that govern the classes. Through examples with various classifiers, we demonstrate how such structures can help in analyzing the classification behavior and in inferring potential corner cases. The source code for one example is available as a notebook at this https URL

[1]  Hod Lipson,et al.  Convergent Learning: Do different neural networks learn the same representations? , 2015, FE@NIPS.

[2]  Samy Bengio,et al.  ADIOS: Architectures Deep In Output Space , 2016, ICML.

[3]  Manik Varma,et al.  FastXML: a fast, accurate and stable tree-classifier for extreme multi-label learning , 2014, KDD.

[4]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[5]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.

[6]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[7]  Andrew Zisserman,et al.  Incremental learning of object detectors using a visual shape alphabet , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Daniel A. Keim,et al.  Visual Boosting in Pixel‐based Visualizations , 2011, Comput. Graph. Forum.

[9]  Jean-Daniel Fekete,et al.  Matrix Reordering Methods for Table and Network Visualization , 2016, Comput. Graph. Forum.

[10]  Qiang Ji,et al.  Multi-label learning with missing labels for image annotation and facial action unit recognition , 2015, Pattern Recognit..

[11]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[12]  Yuhong Guo,et al.  Zero-Shot Classification with Discriminative Semantic Representation Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Joachim M. Buhmann,et al.  Going Metric: Denoising Pairwise Data , 2002, NIPS.

[14]  Angelo Arleo,et al.  Optimal context separation of spiking haptic signals by second-order somatosensory neurons , 2009, NIPS.

[15]  Yu-Chiang Frank Wang,et al.  A Closer Look at Few-shot Classification , 2019, ICLR.

[16]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[17]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[18]  Zhen Li,et al.  Blockout: Dynamic Model Selection for Hierarchical Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Duen Horng Chau,et al.  Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations , 2019, IEEE Transactions on Visualization and Computer Graphics.

[20]  Adam S. Charles,et al.  Visualizing the PHATE of Neural Networks , 2019, NeurIPS.

[21]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[22]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Medha Katehara,et al.  Prediction Scores as a Window into Classifier Behavior , 2017, ArXiv.

[24]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[25]  Björn Ommer,et al.  CliqueCNN: Deep Unsupervised Exemplar Learning , 2016, NIPS.

[26]  Hanspeter Pfister,et al.  HiPiler: Visual Exploration of Large Genome Interaction Matrices with Interactive Small Multiples , 2017, bioRxiv.

[27]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[29]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[30]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Thomas L. Griffiths,et al.  Parametric Embedding for Class Visualization , 2004, Neural Computation.

[32]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[33]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[34]  Martin Wattenberg,et al.  Visualizing and Measuring the Geometry of BERT , 2019, NeurIPS.

[35]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[36]  Bilal Alsallakh,et al.  Understanding the Error Structure as a Key to Regularize Convolutional Neural Networks , 2018 .

[37]  Jason Yosinski,et al.  Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks , 2016, ArXiv.