A survey of factors influencing MLP error surface

Visualization of neural network error surfaces and learning trajectories helps to understand the influence of numerous factors on the neural learning process. This understanding can be used to improve training and design of MLP networks. The following topics are discussed using a few benchmark datasets for illustration: general error surface properties including local minima, plateaus and narrow funnels, their dependence on network structure, input data, transfer and error functions, consequences of weight initialization, and interesting directions in the weight space. The error surfaces are shown in 3-dimensional PCA-based projections. Finally a possibility of effective weight number reduction is discussed.

[1]  Héctor J. Sussmann,et al.  Uniqueness of the weights for minimal feedforward nets with a given input-output map , 1992, Neural Networks.

[2]  Marcus Gallagher,et al.  Visualization of learning in multilayer perceptron networks using principal component analysis , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Mirosław Kordos,et al.  VARIABLE STEP SEARCH ALGORITHM FOR MLP TRAINING , 2005 .

[4]  Włodzisław Duch,et al.  Search-based Training for Logical Rule Extraction by Multilayer Perceptron , 2003 .

[5]  Włodzisław Duch,et al.  Multilayer Perceptron Trained with Numerical Gradient , 2003 .

[6]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[7]  Ananth Ranganathan,et al.  The Levenberg-Marquardt Algorithm , 2004 .

[8]  John E. Moody,et al.  Fast Pruning Using Principal Components , 1993, NIPS.

[9]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[10]  Wlodzislaw Duch,et al.  A new methodology of extraction, optimization and application of crisp and fuzzy logical rules , 2001, IEEE Trans. Neural Networks.

[11]  Marcus Gallagher,et al.  Visualization of Learning in Multi-layer Perceptron Networks using PCA , 2003 .

[12]  Aapo Hyvärinen,et al.  Independent Component Analysis: A Tutorial , 1999 .

[13]  Norbert Jankowski,et al.  Survey of Neural Transfer Functions , 1999 .

[14]  Lawrence D. Jackel,et al.  Large Automatic Learning, Rule Extraction, and Generalization , 1987, Complex Syst..

[15]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[16]  E. Oja,et al.  Independent Component Analysis , 2013 .