Extended Unconstrained Features Model for Exploring Deep Neural Collapse

The modern strategy for training deep neural networks for classification tasks includes optimizing the network’s weights even after the training error vanishes to further push the training loss toward zero. Recently, a phenomenon termed “neural collapse” (NC) has been empirically observed in this training procedure. Specifically, it has been shown that the learned features (the output of the penultimate layer) of within-class samples converge to their mean, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer’s weights. Recent papers have shown that minimizers with this structure emerge when optimizing a simplified “unconstrained features model” (UFM) with a regularized cross-entropy loss. In this paper, we further analyze and extend the UFM. First, we study the UFM for the regularized MSE loss, and show that the minimizers’ features can have a more delicate structure than in the cross-entropy case. This affects also the structure of the weights. Then, we extend the UFM by adding another layer of weights as well as ReLU nonlinearity to the model and generalize our previous results. Finally, we empirically demonstrate the usefulness of our nonlinear extended UFM in modeling the NC phenomenon that occurs with practical networks.

[1]  Zhihui Zhu,et al.  On the Optimization Landscape of Neural Collapse under MSE Loss: Global Optimality with Unconstrained Features , 2022, ICML.

[2]  Jianfeng Lu,et al.  Neural collapse under cross-entropy loss , 2022, Applied and Computational Harmonic Analysis.

[3]  Yiping Lu,et al.  An Unconstrained Layer-Peeled Perspective on Neural Collapse , 2021, ICLR.

[4]  X. Y. Han,et al.  Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path , 2021, ICLR.

[5]  Dustin G. Mixon,et al.  Neural collapse with unconstrained features , 2020, Sampling Theory, Signal Processing, and Data Analysis.

[6]  Marc Niethammer,et al.  Dissecting Supervised Constrastive Learning , 2021, ICML.

[7]  Zhihui Zhu,et al.  A Geometric Analysis of Neural Collapse with Unconstrained Features , 2021, NeurIPS.

[8]  Samy Bengio,et al.  Understanding deep learning (still) requires rethinking generalization , 2021, Commun. ACM.

[9]  Hangfeng He,et al.  Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training , 2021, Proceedings of the National Academy of Sciences.

[10]  S. Mallat,et al.  Separation and Concentration in Deep Networks , 2020, ICLR.

[11]  On the emergence of simplex symmetry in the final and penultimate layers of neural network classifiers , 2020, 2012.05420.

[12]  Mikhail Belkin,et al.  Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks , 2020, ICLR.

[13]  Mert Pilanci,et al.  Revealing the Structure of Deep Neural Networks via Convex Duality , 2020, ICML.

[14]  Vardan Papyan,et al.  Traces of Class/Cross-Class Structure Pervade Deep Learning Spectra , 2020, ArXiv.

[15]  David L. Donoho,et al.  Prevalence of neural collapse during the terminal phase of deep learning training , 2020, Proceedings of the National Academy of Sciences.

[16]  Mikhail Belkin,et al.  Does data interpolation contradict statistical optimality? , 2018, AISTATS.

[17]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[18]  Raef Bassily,et al.  The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning , 2017, ICML.

[19]  Elad Hoffer,et al.  Train longer, generalize better: closing the generalization gap in large batch training of neural networks , 2017, NIPS.

[20]  Joan Bruna,et al.  Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.

[21]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  R. Vershynin How Close is the Sample Covariance Matrix to the Actual Covariance Matrix? , 2010, 1004.3484.

[27]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[28]  Nathan Srebro,et al.  Learning with matrix factorizations , 2004 .

[29]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .