Decomposition Techniques for Multilayer Perceptron Training

In this paper, we consider the learning problem of multilayer perceptrons (MLPs) formulated as the problem of minimizing a smooth error function. As well known, the learning problem of MLPs can be a difficult nonlinear nonconvex optimization problem. Typical difficulties can be the presence of extensive flat regions and steep sided valleys in the error surface, and the possible large number of training data and of free network parameters. We define a wide class of batch learning algorithms for MLP, based on the use of block decomposition techniques in the minimization of the error function. The learning problem is decomposed into a sequence of smaller and structured minimization problems in order to advantageously exploit the structure of the objective function. Theoretical convergence results are established, and a specific algorithm is constructed and evaluated through an extensive numerical experimentation. The comparisons with the state-of-the-art learning algorithms show the effectiveness of the proposed techniques.

[1]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[2]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[3]  David H. Mathews,et al.  Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change , 2006, BMC Bioinformatics.

[4]  Marco Sciandrone,et al.  A convergent decomposition method for box-constrained optimization problems , 2009, Optim. Lett..

[5]  Marco Sciandrone,et al.  Continuous Optimization On the convergence of inexact block coordinate descent methods for constrained optimization , 2013 .

[6]  Magnus R. Hestenes,et al.  Conjugate Direction Methods in Optimization , 1980 .

[7]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[8]  Chee Kheong Siew,et al.  Universal Approximation using Incremental Constructive Feedforward Networks with Random Hidden Nodes , 2006, IEEE Transactions on Neural Networks.

[9]  MSc PhD Adrian J. Shepherd BA Second-Order Methods for Neural Networks , 1997, Perspectives in Neural Computing.

[10]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[11]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[12]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[13]  Luigi Grippof,et al.  Globally convergent block-coordinate techniques for unconstrained optimization , 1999 .

[14]  Luigi Grippo,et al.  Convergent Decomposition Techniques for Training RBF Neural Networks , 2001, Neural Computation.

[15]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[16]  S. Bonettini Inexact block coordinate descent methods with application to non-negative matrix factorization , 2011 .

[17]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[18]  Guang-Bin Huang,et al.  An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels , 2014, Cognitive Computation.

[19]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[20]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Allan Pinkus,et al.  Approximation theory of the MLP model in neural networks , 1999, Acta Numerica.

[22]  Marcos Raydan,et al.  The Barzilai and Borwein Gradient Method for the Large Scale Unconstrained Minimization Problem , 1997, SIAM J. Optim..

[23]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[24]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[25]  Chee Kheong Siew,et al.  Incremental extreme learning machine with fully complex hidden nodes , 2008, Neurocomputing.

[26]  Luigi Grippo,et al.  Nonmonotone Globalization Techniques for the Barzilai-Borwein Gradient Method , 2002, Comput. Optim. Appl..