Visualising Basins of Attraction for the Cross-Entropy and the Squared Error Neural Network Loss Functions

Quantification of the stationary points and the associated basins of attraction of neural network loss surfaces is an important step towards a better understanding of neural network loss surfaces at large. This work proposes a novel method to visualise basins of attraction together with the associated stationary points via gradient-based random sampling. The proposed technique is used to perform an empirical study of the loss surfaces generated by two different error metrics: quadratic loss and entropic loss. The empirical observations confirm the theoretical hypothesis regarding the nature of neural network attraction basins. Entropic loss is shown to exhibit stronger gradients and fewer stationary points than quadratic loss, indicating that entropic loss has a more searchable landscape. Quadratic loss is shown to be more resilient to overfitting than entropic loss. Both losses are shown to exhibit local minima, but the number of local minima is shown to decrease with an increase in dimensionality. Thus, the proposed visualisation technique successfully captures the local minima properties exhibited by the neural network loss surfaces, and can be used for the purpose of fitness landscape analysis of neural networks.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  Leonard G. C. Hamey,et al.  XOR has no local minima: A case study in neural network error surface analysis , 1998, Neural Networks.

[3]  Saman K. Halgamuge,et al.  On the selection of fitness landscape analysis metrics for continuous optimization problems , 2014, 7th International Conference on Information and Automation for Sustainability.

[4]  C. H. Edwards Advanced calculus of several variables , 1973 .

[5]  Richard J. Duro,et al.  Evolutionary algorithm characterization in real parameter optimization problems , 2013, Appl. Soft Comput..

[6]  Jorge Nocedal,et al.  On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.

[7]  Hervé Bourlard,et al.  Connectionist Speech Recognition: A Hybrid Approach , 1993 .

[8]  Marcus Gallagher,et al.  Multi-layer Perceptron Error Surfaces: Visualization, Structure and Modelling , 2000 .

[9]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[10]  Andries Petrus Engelbrecht,et al.  Fitness Landscape Analysis of Weight-Elimination Neural Networks , 2017, Neural Processing Letters.

[11]  Razvan Pascanu,et al.  Sharp Minima Can Generalize For Deep Nets , 2017, ICML.

[12]  Douglas Kline,et al.  Revisiting squared-error and cross-entropy functions for training neural network classifiers , 2005, Neural Computing & Applications.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Yoshua Bengio,et al.  A Walk with SGD , 2018, ArXiv.

[15]  Peter Auer,et al.  Exponentially many local minima for single neurons , 1995, NIPS.

[16]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[17]  Mirosław Kordos,et al.  A survey of factors influencing MLP error surface , 2004 .

[18]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[19]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[20]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[21]  Andries Engelbrecht,et al.  Analysis of error landscapes in multi-layered neural networks for classification , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).

[22]  Yann Dauphin,et al.  Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.

[23]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[24]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[25]  Hao Shen,et al.  Towards a Mathematical Understanding of the Difficulty in Learning with Feedforward Neural Networks , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Andries Petrus Engelbrecht,et al.  A progressive random walk algorithm for sampling continuous fitness landscapes , 2014, 2014 IEEE Congress on Evolutionary Computation (CEC).

[27]  Fred A. Hamprecht,et al.  Essentially No Barriers in Neural Network Energy Landscape , 2018, ICML.

[28]  L. Kallel,et al.  How to detect all maxima of a function , 2001 .

[29]  Hermann Ney,et al.  Cross-entropy vs. squared error training: a theoretical and experimental comparison , 2013, INTERSPEECH.

[30]  Marcus Gallagher,et al.  Sampling Techniques and Distance Metrics in High Dimensional Continuous Landscape Analysis: Limitations and Improvements , 2014, IEEE Transactions on Evolutionary Computation.

[31]  Zbigniew Michalewicz,et al.  Evolutionary algorithms , 1997, Emerging Evolutionary Algorithms for Antennas and Wireless Communications.

[32]  Katherine M. Malan Characterising continuous optimisation problems for particle swarm optimisation performance prediction , 2014 .

[33]  Yann LeCun,et al.  Open Problem: The landscape of the loss surfaces of multilayer networks , 2015, COLT.

[34]  Bernd Freisleben,et al.  Fitness landscape analysis and memetic algorithms for the quadratic assignment problem , 2000, IEEE Trans. Evol. Comput..

[35]  Edgar A. Bernal,et al.  The Loss Surface of XOR Artificial Neural Networks , 2018, Physical review. E.

[36]  Razvan Pascanu,et al.  Local minima in training of neural networks , 2016, 1611.06310.

[37]  Marco Tomassini,et al.  Evolutionary Algorithms , 1995, Towards Evolvable Hardware.

[38]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[39]  Andries Petrus Engelbrecht,et al.  Search space boundaries in neural network error landscape analysis , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[40]  Mario A. Muñoz,et al.  Algorithm selection for black-box continuous optimization problems: A survey on methods and challenges , 2015, Inf. Sci..

[41]  Josselin Garnier,et al.  Efficiency of Local Search with Multiple Local Optima , 2001, SIAM J. Discret. Math..

[42]  Richard J. Duro,et al.  Real-Valued Multimodal Fitness Landscape Characterization for Evolution , 2010, ICONIP.

[43]  Ida G. Sprinkhuizen-Kuyper,et al.  The local minima of the error surface of the 2-2-1 XOR network , 2004, Annals of Mathematics and Artificial Intelligence.

[44]  Andries Petrus Engelbrecht,et al.  Quantifying ruggedness of continuous landscapes using entropy , 2009, 2009 IEEE Congress on Evolutionary Computation.

[45]  Anna Sergeevna Bosman,et al.  Characterising neutrality in neural network error landscapes , 2017, 2017 IEEE Congress on Evolutionary Computation (CEC).

[46]  Jonathan E. Rowe,et al.  Simple Random Sampling Estimation of the Number of Local Optima , 2016, PPSN.

[47]  Peter R. Winters,et al.  Forecasting Sales by Exponentially Weighted Moving Averages , 1960 .

[48]  Daniel J. Denis,et al.  The early origins and development of the scatterplot. , 2005, Journal of the history of the behavioral sciences.

[49]  Ida G. Sprinkhuizen-Kuyper,et al.  A local minimum for the 2-3-1 XOR network , 1999, IEEE Trans. Neural Networks.

[50]  Esther Levin,et al.  Accelerated Learning in Layered Neural Networks , 1988, Complex Syst..

[51]  Andries Petrus Engelbrecht,et al.  Progressive gradient walk for neural network fitness landscape analysis , 2018, GECCO.

[52]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[53]  Yann LeCun,et al.  The Loss Surfaces of Multilayer Networks , 2014, AISTATS.