Optimizing Mode Connectivity via Neuron Alignment

The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations. We propose a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected. To approximate the optimal permutation, we introduce an inexpensive heuristic referred to as neuron alignment. Neuron alignment promotes similarity between the distribution of intermediate activations of models along the curve. We provide theoretical analysis establishing the benefit of alignment to mode connectivity based on this simple heuristic. We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can significantly alleviate the recently identified robust loss barrier on the path connecting two adversarial robust models and find more robust and accurate models on the path.

[1]  Liwei Wang,et al.  Towards Understanding Learning Representations: To What Extent Do Different Neural Networks Learn the Same Representation , 2018, NeurIPS.

[2]  Brian B. Avants,et al.  Symmetric diffeomorphic image registration with cross-correlation: Evaluating automated labeling of elderly and neurodegenerative brain , 2008, Medical Image Anal..

[3]  Fanny Dufossé,et al.  Notes on Birkhoff-von Neumann decomposition of doubly stochastic matrices , 2016 .

[4]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[5]  Rainer E. Burkard,et al.  Linear Assignment Problems and Extensions , 1999, Handbook of Combinatorial Optimization.

[6]  Karthikeyan Natesan Ramamurthy,et al.  Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness , 2020, ICLR.

[7]  Joan Bruna,et al.  Topology and Geometry of Half-Rectified Network Optimization , 2016, ICLR.

[8]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[9]  Hod Lipson,et al.  Convergent Learning: Do different neural networks learn the same representations? , 2015, FE@NIPS.

[10]  Yann LeCun,et al.  Explorations on high dimensional landscapes , 2014, ICLR.

[11]  Ohad Shamir,et al.  On the Quality of the Initial Basin in Overspecified Neural Networks , 2015, ICML.

[12]  Martin Jaggi,et al.  Model Fusion via Optimal Transport , 2019, NeurIPS.

[13]  Robert Hecht-Nielsen,et al.  On the Geometry of Feedforward Neural Network Error Surfaces , 1993, Neural Computation.

[14]  Fred A. Hamprecht,et al.  Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.

[15]  Jinfeng Yi,et al.  Is Robustness the Cost of Accuracy? - A Comprehensive Study on the Robustness of 18 Deep Image Classification Models , 2018, ECCV.

[16]  Xaq Pitkow,et al.  Skip Connections Eliminate Singularities , 2017, ICLR.

[17]  Wulfram Gerstner,et al.  Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape , 2019, ArXiv.

[18]  Andrew Gordon Wilson,et al.  Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs , 2018, NeurIPS.

[19]  J. Zico Kolter,et al.  Overfitting in adversarially robust deep learning , 2020, ICML.

[20]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.

[21]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[22]  Sanjeev Arora,et al.  Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets , 2019, NeurIPS.

[23]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Minhyung Cho,et al.  Riemannian approach to batch normalization , 2017, NIPS.

[25]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness via Curvature Regularization, and Vice Versa , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yangyang Xu,et al.  Proximal gradient method for huberized support vector machine , 2015, Pattern Analysis and Applications.

[27]  Patrick Speissegger,et al.  O-minimal preparation theorems , 2002 .

[28]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[29]  Oriol Vinyals,et al.  Qualitatively characterizing neural network optimization problems , 2014, ICLR.

[30]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[32]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[33]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[34]  Geoffrey E. Hinton,et al.  Similarity of Neural Network Representations Revisited , 2019, ICML.

[35]  Richard Socher,et al.  Using Mode Connectivity for Loss Landscape Analysis , 2018, ArXiv.

[36]  Jascha Sohl-Dickstein,et al.  SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability , 2017, NIPS.

[37]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[38]  Samy Bengio,et al.  Insights on representational similarity in neural networks with canonical correlation , 2018, NeurIPS.