Learning with first, second, and no derivatives: A case study in high energy physics

Abstract In this paper different algorithms for training multilayer perceptron architectures are applied to a significant discrimination task in high energy physics. The One-Step Secant technique is compared with on-line backpropagation, the ‘Bold Driver’ batch version and conjugate gradient methods. In addition, a new algorithm (affine shaker) is proposed that uses stochastic search based on function values and affine transformations of the local search region. Although the affine shaker requires more CPU time to reach the maximum generalization, the technique can be interesting for special-purpose VLSI implementations and for non-differentiable functions.

[1]  Alan S. Lapedes,et al.  A self-optimizing, nonsymmetrical neural net for content addressable memory and pattern recognition , 1986 .

[2]  Ronald A. Cole,et al.  A neural-net training program based on conjugate-radient optimization , 1989 .

[3]  Martin Fodslette Møller,et al.  Supervised Learning On Large Redundant Training Sets , 1993, Int. J. Neural Syst..

[4]  George Cybenko,et al.  Ill-Conditioning in Neural Network Training Problems , 1993, SIAM J. Sci. Comput..

[5]  R. Odorico,et al.  COJETS 6.23: a Monte Carlo simulation program for p-p, p-p collisions and e+e- annihilation , 1992 .

[6]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[7]  M. Moller,et al.  Supervised learning on large redundant training sets , 1992, Neural Networks for Signal Processing II Proceedings of the 1992 IEEE Workshop.

[8]  Raymond L. Watrous Learning Algorithms for Connectionist Networks: Applied Gradient Methods of Nonlinear Optimization , 1988 .

[9]  M. Møller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1990 .

[10]  Roger J.-B. Wets,et al.  Minimization by Random Search Techniques , 1981, Math. Oper. Res..

[11]  Roberto Battiti,et al.  Training neural nets with the reactive tabu search , 1995, IEEE Trans. Neural Networks.

[12]  Claas de Groot,et al.  Analysis of univariate time series with connectionist nets: A case study of two classical examples , 1991, Neurocomputing.

[13]  E. Caianiello Outline of a theory of thought-processes and thinking machines. , 1961, Journal of theoretical biology.

[14]  Farid U. Dowla,et al.  Backpropagation Learning for Multilayer Feed-Forward Neural Networks Using the Conjugate Gradient Method , 1991, Int. J. Neural Syst..

[15]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[16]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[17]  Roberto Serra,et al.  Complex Systems and Cognitive Processes , 1990, Springer Berlin Heidelberg.

[18]  Philip E. Gill,et al.  Practical optimization , 1981 .

[19]  Tomaso Poggio,et al.  A project for an intelligent system: Vision and learning , 1992 .

[20]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[21]  Roberto Battiti,et al.  BFGS Optimization for Faster and Automated Supervised Learning , 1990 .

[22]  Chris Bishop,et al.  Exact Calculation of the Hessian Matrix for the Multilayer Perceptron , 1992, Neural Computation.

[23]  B. Boser,et al.  Backpropagation Learning for Multi-layer Feed-forward Neural Networks Using the Conjugate Gradient Method. Ieee Transactions on Neural Networks, 1991. [31] M. F. Mller. a Scaled Conjugate Gradient Algorithm for Fast Supervised Learning. Technical Report Pb-339 , 2007 .

[24]  David F. Shanno,et al.  Conjugate Gradient Methods with Inexact Searches , 1978, Math. Oper. Res..

[25]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[26]  John E. Dennis,et al.  Numerical methods for unconstrained optimization and nonlinear equations , 1983, Prentice Hall series in computational mathematics.

[27]  Roberto Brunelli,et al.  Training neural nets through stochastic minimization , 1994, Neural Networks.

[28]  R. Odorico,et al.  COJETS: A Monte Carlo program simulating QCD in hadronic production of jets and heavy flavours with inclusion of initial QCD Bremsstrahlung , 1984 .

[29]  R. Brunelli,et al.  Stochastic minimization with adaptive memory , 1995 .