Subset-based training and pruning of sigmoid neural networks

In the present paper we develop two algorithms, subset-based training (SBT) and subset-based training and pruning (SBTP), using the fact that the Jacobian matrices in sigmoid network training problems are usually rank deficient. The weight vectors are divided into two parts during training, according to the Jacobian rank sizes. Both SBT and SBTP are trust-region methods. Compared with the standard Levenberg-Marquardt (LM) method, these two algorithms can achieve similar convergence properties as the LM but with fewer memory requirements. Furthermore the SBTP combines training and pruning of a network into one comprehensive procedure. The effectiveness of the two algorithms is evaluated using three examples. Comparisons are made with some existing algorithms. Some convergence properties of the two algorithms are given to qualitatively evaluate the performance of the algorithms.

[1]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[2]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[3]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[4]  D. Rumelhart,et al.  Predicting sunspots and exchange rates with connectionist networks , 1991 .

[5]  John E. Moody,et al.  Fast Pruning Using Principal Components , 1993, NIPS.

[6]  David E. Rumelhart,et al.  Generalization by Weight-Elimination with Application to Forecasting , 1990, NIPS.

[7]  Jorge J. Moré,et al.  Computing a Trust Region Step , 1983 .

[8]  Jack Dongarra,et al.  LINPACK Users' Guide , 1987 .

[9]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[10]  Mohammad Bagher Menhaj,et al.  Training feedforward networks with the Marquardt algorithm , 1994, IEEE Trans. Neural Networks.

[11]  Jorge J. Moré,et al.  Recent Developments in Algorithms and Software for Trust Region Methods , 1982, ISMP.

[12]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[13]  William L. Luyben,et al.  Process Modeling, Simulation and Control for Chemical Engineers , 1973 .

[14]  H. Schwetlick,et al.  Nonstandard scaling matrices for trust region Gauss-Newton methods , 1989 .

[15]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[16]  Jennie Si,et al.  A Systematic and Effective Supervised Learning Mechanism Based on Jacobian Rank Deficiency , 1998, Neural Computation.