Least third-order cumulant method with adaptive regularization parameter selection for neural networks

Abstract This paper introduces an interesting property of the least third-order cumulant objective function. The property is that the solution is optimal when the gradients of Mean Squares error and third-order cumulant error are zero vectors. The optimal solutions are independent of the value of regularization parameter λ . Also, an adaptive regularization parameter selection method is derived to control the convergences of Mean Squares error and the cumulant error terms. The proposed selection method is able to tunnel through the sub-optimal solutions, of which the locations are controllable, via changing the value of the regularization parameter. Consequently, the least third-order cumulant method with the adaptive regularization parameter selection method is theoretically capable of estimating an optimal solution when it is applied to regression problems.

[1]  Anders Krogh,et al.  A Simple Weight Decay Can Improve Generalization , 1991, NIPS.

[2]  Tommy W. S. Chow,et al.  Adaptive Regularization Parameter Selection Method for Enhancing Generalization Capability of Neural Networks , 1999, Artif. Intell..

[3]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[4]  C. L. Nikias,et al.  Higher-order spectra analysis : a nonlinear signal processing framework , 1993 .

[5]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[6]  T. Chow,et al.  Nonlinear autoregressive integrated neural network model for short-term load forecasting , 1996 .

[7]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[8]  N. Dodd,et al.  Optimisation of network structure using genetic techniques , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[9]  Gustavo Deco,et al.  Unsupervised Mutual Information Criterion for Elimination of Overtraining in Supervised Multilayer Networks , 1995, Neural Computation.

[10]  C. McDiarmid SIMULATED ANNEALING AND BOLTZMANN MACHINES A Stochastic Approach to Combinatorial Optimization and Neural Computing , 1991 .

[11]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[12]  Benjamin W. Wah,et al.  Global Optimization for Neural Network Training , 1996, Computer.

[13]  Lorien Y. Pratt,et al.  Comparing Biases for Minimal Network Construction with Back-Propagation , 1988, NIPS.

[14]  Yaser S. Abu-Mostafa,et al.  Hints , 2018, Neural Computation.

[15]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[16]  Tommy W. S. Chow,et al.  Training multilayer neural networks using fast global learning algorithm - least-squares and penalized optimization methods , 1999, Neurocomputing.

[17]  Tommy W. S. Chow,et al.  Neural network based short-term load forecasting using weather compensation , 1996 .

[18]  Andreas S. Weigend,et al.  Time Series Prediction: Forecasting the Future and Understanding the Past , 1994 .