The effect of task and training on intermediate representations in convolutional neural networks revealed with modified RV similarity analysis

Centered Kernel Alignment (CKA) was recently proposed as a similarity metric for comparing activation patterns in deep networks. Here we experiment with the modified RV-coefficient (RV2), which has very similar properties as CKA while being less sensitive to dataset size. We compare the representations of networks that received varying amounts of training on different layers: a standard trained network (all parameters updated at every step), a freeze trained network (layers gradually frozen during training), random networks (only some layers trained), and a completely untrained network. We found that RV2 was able to recover expected similarity patterns and provide interpretable similarity matrices that suggested hypotheses about how representations are affected by different training recipes. We propose that the superior performance achieved by freeze training can be attributed to representational differences in the penultimate layer. Our comparisons of random networks suggest that the inputs and targets serve as anchors on the representations in the lowest and highest layers.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  P. Robert,et al.  A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient , 1976 .

[3]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[4]  Age K. Smilde,et al.  Real-life metabolomics data analysis : how to deal with complex data ? , 2010 .

[5]  Ha Hong,et al.  Hierarchical Modular Optimization of Convolutional Networks Achieves Representations Similar to Macaque IT and Human Ventral Stream , 2013, NIPS.

[6]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[7]  Jack L. Gallant,et al.  Pyrcca: Regularized Kernel Canonical Correlation Analysis in Python and Its Applications to Neuroimaging , 2015, Front. Neuroinform..

[8]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Understanding and Improvement , 2017, ArXiv.

[9]  Yoshua Bengio,et al.  Understanding intermediate layers using linear classifier probes , 2016, ICLR.

[10]  Yoshua Bengio,et al.  How can deep learning advance computational modeling of sensory information processing? , 2018, ArXiv.

[11]  Samy Bengio,et al.  Insights on representational similarity in neural networks with canonical correlation , 2018, NeurIPS.

[12]  Geoffrey E. Hinton,et al.  Similarity of Neural Network Representations Revisited , 2019, ICML.

[13]  Yoshua Bengio,et al.  How Transferable Are Features in Convolutional Neural Network Acoustic Models across Languages? , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).