Intrinsic dimension of data representations in deep neural networks

Deep neural networks progressively transform their inputs across multiple processing layers. What are the geometrical properties of the representations learned by these networks? Here we study the intrinsic dimensionality (ID) of data-representations, i.e. the minimal number of parameters needed to describe a representation. We find that, in a trained network, the ID is orders of magnitude smaller than the number of units in each layer. Across layers, the ID first increases and then progressively decreases in the final layers. Remarkably, the ID of the last hidden layer predicts classification accuracy on the test set. These results can neither be found by linear dimensionality estimates (e.g., with principal component analysis), nor in representations that had been artificially linearized. They are neither found in untrained networks, nor in networks that are trained on randomized labels. This suggests that neural networks that can generalize are those that transform the data into low-dimensional, but not necessarily flat manifolds.

[1]  Daniel D. Lee,et al.  Classification and Geometry of General Perceptual Manifolds , 2017, Physical Review X.

[2]  Stefano Soatto,et al.  Emergence of Invariance and Disentanglement in Deep Representations , 2017, 2018 Information Theory and Applications Workshop (ITA).

[3]  Alessandro Laio,et al.  Estimating the intrinsic dimension of datasets by a minimal neighborhood information , 2017, Scientific Reports.

[4]  Ken-ichi Kawarabayashi,et al.  Estimating Local Intrinsic Dimensionality , 2015, KDD.

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Marcello Pelillo,et al.  Characterization of Visual Object Representations in Rat Primary Visual Cortex , 2018, ECCV Workshops.

[7]  James Bailey,et al.  Characterizing Adversarial Subspaces Using Local Intrinsic Dimensionality , 2018, ICLR.

[8]  Giulio Matteucci,et al.  Nonlinear Processing of Shape Information in Rat Lateral Extrastriate Cortex , 2018, The Journal of Neuroscience.

[9]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[10]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[11]  Samy Bengio,et al.  Insights on representational similarity in neural networks with canonical correlation , 2018, NeurIPS.

[12]  Misha Denil,et al.  Predicting Parameters in Deep Learning , 2014 .

[13]  Tao Yu,et al.  Curvature-based Comparison of Two Neural Networks , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[14]  Kenneth D. Harris,et al.  High-dimensional geometry of population responses in visual cortex , 2019, Nat..

[15]  Vincenzo Carnevale,et al.  Accurate Estimation of the Intrinsic Dimension Using Graph Distances: Unraveling the Geometric Complexity of Datasets , 2016, Scientific Reports.

[16]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[17]  James Bailey,et al.  The vulnerability of learning to adversarial perturbation increases with intrinsic dimensionality , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[18]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[19]  Ronen Basri,et al.  Efficient Representation of Low-Dimensional Manifolds using Deep Networks , 2016, ICLR.

[20]  Surya Ganguli,et al.  An analytic theory of generalization dynamics and transfer learning in deep linear networks , 2018, ICLR.

[21]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[22]  James J. DiCarlo,et al.  How Does the Brain Solve Visual Object Recognition? , 2012, Neuron.

[23]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[24]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Yann LeCun,et al.  Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks , 2018, ArXiv.

[26]  Stefano Panzeri,et al.  Emergence of transformation-tolerant representations of visual objects in rat lateral extrastriate cortex , 2017, eLife.

[27]  David D. Cox,et al.  Opinion TRENDS in Cognitive Sciences Vol.11 No.8 Untangling invariant object recognition , 2022 .

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Haim Sompolinsky,et al.  Linear readout of object manifolds. , 2015, Physical review. E.

[30]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[31]  Eero P. Simoncelli,et al.  Natural image statistics and neural representation. , 2001, Annual review of neuroscience.

[32]  Vishnu Naresh Boddeti,et al.  On the Intrinsic Dimensionality of Image Representations , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Haiping Huang,et al.  Mechanisms of dimensionality reduction and decorrelation in deep neural networks , 2018, Physical Review E.

[34]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[35]  Dapeng Oliver Wu,et al.  Why Deep Learning Works: A Manifold Disentanglement Perspective , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jakob H. Macke,et al.  Analyzing biological and artificial neural networks: challenges with opportunities for synergy? , 2018, Current Opinion in Neurobiology.