Learning Dynamics of Deep Networks Admit Low-Rank Tensor Descriptions

Deep feedforward neural networks are associated with complicated, nonconvex objective functions. Yet, simple optimization algorithms can identify parameters that generalize well to held-out data. We currently lack detailed descriptions of this learning process, even on a qualitative level. We propose a simple tensor decomposition model to study how hidden representations evolve over learning. This approach precisely extracts the correct dynamics of learning in linear networks, which admit closed form solutions. On deep, nonlinear architectures performing image classification (CIFAR-10), we find empirically that a low-rank tensor model can explain a large fraction of variance while extracting meaningful features, such as stage-like learning and selectivity to inputs.

[1]  Hao Li,et al.  Visualizing the Loss Landscape of Neural Nets , 2017, NeurIPS.

[2]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[3]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[4]  Surya Ganguli,et al.  Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[5]  Kan-Jian Zhang,et al.  Theoretical and numerical analysis of learning dynamics near singularity in multilayer perceptrons , 2015, Neurocomputing.

[6]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[7]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[8]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[9]  Tamara G. Kolda,et al.  A Practical Randomized CP Tensor Decomposition , 2017, SIAM J. Matrix Anal. Appl..

[10]  Pierre Comon,et al.  Uniqueness of Nonnegative Tensor Approximations , 2014, IEEE Transactions on Information Theory.

[11]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[12]  Steven L. Brunton,et al.  Randomized CP tensor decomposition , 2017, Mach. Learn. Sci. Technol..

[13]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[14]  Tamara G. Kolda,et al.  Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor components analysis , 2017, bioRxiv.