Online gradient method with smoothing ℓ0 regularization for feedforward neural networks

źp regularization has been a popular pruning method for neural networks. The parameter p was usually set as 0 < p ź 2 in the literature, and practical training algorithms with ź0 regularization are lacking due to the NP-hard nature of the ź0 regularization problem; however, the ź0 regularization tends to produce the sparsest solution, corresponding to the most parsimonious network structure which is desirable in view of the generalization ability. To this end, this paper considers an online gradient training algorithm with smoothing ź0 regularization (OGTSL0) for feedforward neural networks, where the ź0 regularizer is approximated by a series of smoothing functions. The underlying principle for the sparsity of OGTSL0 is provided, and the convergence of the algorithm is also theoretically analyzed. Simulation examples support the theoretical analysis and illustrate the superiority of the proposed algorithm.

[1]  Jing Wang,et al.  Convergence of batch gradient learning algorithm with smoothing L1/2 regularization for Sigma-Pi-Sigma neural networks , 2015, Neurocomputing.

[2]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[3]  Wei Wu,et al.  Boundedness and Convergence of Online Gradient Method With Penalty for Feedforward Neural Networks , 2009, IEEE Transactions on Neural Networks.

[4]  C. Charalambous,et al.  Conjugate gradient algorithm for efficient training of artifi-cial neural networks , 1990 .

[5]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[6]  George W. Irwin,et al.  Improving neural network training solutions using regularisation , 2001, Neurocomputing.

[7]  T. Kathirvalavakumar,et al.  A Novel Pruning Algorithm for Optimizing Feedforward Neural Network of Classification Problems , 2011, Neural Processing Letters.

[8]  Jacek M. Zurada,et al.  Convergence of online gradient method for feedforward neural networks with smoothing L1/2 regularization penalty , 2014, Neurocomputing.

[9]  Tong Zhang Approximation Bounds for Some Sparse Kernel Regression Algorithms , 2002, Neural Computation.

[10]  Xin Yu,et al.  Monotonicity and convergence of asynchronous update gradient method for ridge polynomial neural network , 2014, Neurocomputing.

[11]  Kazumi Saito,et al.  Second-Order Learning Algorithm with Squared Penalty Term , 1996, Neural Computation.

[12]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[13]  Hideaki Sakai,et al.  A real-time learning algorithm for a multilayered neural network based on the extended Kalman filter , 1992, IEEE Trans. Signal Process..

[14]  Stanislaw Osowski,et al.  Fast Second Order Learning Algorithm for Feedforward Multilayer Neural Networks and its Applications , 1996, Neural Networks.

[15]  Gaofeng Zheng,et al.  Boundedness and convergence of online gradient method with penalty and momentum , 2011, Neurocomputing.

[16]  Alberto Bemporad,et al.  Sparse solutions to the average consensus problem via l1-norm regularization of the fastest mixing Markov-chain problem , 2015, 53rd IEEE Conference on Decision and Control.

[17]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[18]  Lijun Liu,et al.  Relaxed conditions for convergence of batch BPAP for feedforward neural networks , 2015, Neurocomputing.

[19]  Masumi Ishikawa,et al.  Structural learning with forgetting , 1996, Neural Networks.

[20]  Rudy Setiono,et al.  A Penalty-Function Approach for Pruning Feedforward Neural Networks , 1997, Neural Computation.

[21]  Tony R. Martinez,et al.  The general inefficiency of batch training for gradient descent learning , 2003, Neural Networks.

[22]  Marcello Sanguineti,et al.  Regularization Techniques and Suboptimal Solutions to Optimization Problems in Learning from Data , 2010, Neural Computation.

[23]  John N. Tsitsiklis,et al.  Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..

[24]  Tom Heskes,et al.  A theoretical comparison of batch-mode, on-line, cyclic, and almost-cyclic learning , 1996, IEEE Trans. Neural Networks.

[25]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  Jacek M. Zurada,et al.  Deterministic convergence of conjugate gradient method for feedforward neural networks , 2011, Neurocomputing.

[28]  Xiaodong Liu,et al.  Batch gradient training method with smoothing $$\boldsymbol{\ell}_{\bf 0}$$ℓ0 regularization for feedforward neural networks , 2015, Neural Computing and Applications.

[29]  Demetris Stathakis,et al.  How many hidden layers and nodes? , 2009 .

[30]  Laxmidhar Behera,et al.  On Adaptive Learning Rate That Guarantees Convergence in Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[31]  L. BartlettP. The sample complexity of pattern classification with neural networks , 2006 .

[32]  Ehud D. Karnin,et al.  A simple procedure for pruning back-propagation trained neural networks , 1990, IEEE Trans. Neural Networks.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[34]  Wei Wu,et al.  Boundedness and convergence of batch back-propagation algorithm with penalty for feedforward neural networks , 2012, Neurocomputing.

[35]  Wei Wu,et al.  Convergence analysis of online gradient method for BP neural networks , 2011, Neural Networks.

[36]  Peng Liu,et al.  Data regularization using Gaussian beams decomposition and sparse norms , 2013 .

[37]  Xin Yu,et al.  Convergence of gradient method with penalty for Ridge Polynomial neural network , 2012, Neurocomputing.

[38]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[39]  Tong Zhang,et al.  Analysis of Multi-stage Convex Relaxation for Sparse Regularization , 2010, J. Mach. Learn. Res..

[40]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[41]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[42]  Wei Wu,et al.  A modified gradient learning algorithm with smoothing L1/2 regularization for Takagi-Sugeno fuzzy models , 2014, Neurocomputing.

[43]  H. Akaike A new look at the statistical model identification , 1974 .

[44]  Jacek M. Zurada,et al.  Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks , 2014, Neural Networks.

[45]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.