Sign-methods for training with imprecise error function and gradient values

Training algorithms suitable to work under imprecise conditions are proposed. They require only the algebraic sign of the error function or its gradient to be correct, and depending on the way they update the weights, they are analyzed as composite nonlinear successive overrelaxation (SOR) methods or composite nonlinear Jacobi methods, applied to the gradient of the error function. The local convergence behavior of the proposed algorithms is also studied. The proposed approach seems practically useful when training is affected by technology imperfections, limited precision in operations and data, hardware component variations and environmental changes that cause unpredictable deviations of parameter values from the designed configuration. Therefore, it may be difficult or impossible to obtain very precise values for the error function and the gradient of the error during training.

[1]  G. Stewart Introduction to matrix computations , 1973 .

[2]  James M. Ortega,et al.  Iterative solution of nonlinear equations in several variables , 2014, Computer science and applied mathematics.

[3]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[4]  Patrick van der Smagt Minimisation methods for training feedforward neural networks , 1994, Neural Networks.

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Jenq-Neng Hwang,et al.  Finite precision error analysis of neural network electronic hardware implementations , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[7]  George D. Magoulas,et al.  Effective Backpropagation Training with Variable Stepsize , 1997, Neural Networks.

[8]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[9]  Thomas P. Vogl,et al.  Rescaling of variables in back propagation learning , 1991, Neural Networks.

[10]  M. N. Vrahatis,et al.  A New Unconstrained Optimization Method for Imprecise Function and Gradient Values , 1996 .

[11]  K. Sikorski Bisection is optimal , 1982 .

[12]  Jenq-Neng Hwang,et al.  Finite Precision Error Analysis of Neural Network Hardware Implementations , 1993, IEEE Trans. Computers.

[13]  Gary G. R. Green,et al.  Neural networks, approximation theory, and finite precision computation , 1995, Neural Networks.

[14]  Roberto Battiti,et al.  First- and Second-Order Methods for Learning: Between Steepest Descent and Newton's Method , 1992, Neural Computation.

[15]  Philip E. Gill,et al.  Practical optimization , 1981 .

[16]  Elijah Polak,et al.  Optimization: Algorithms and Consistent Approximations , 1997 .

[17]  Michael N. Vrahatis,et al.  Algorithm 666: Chabis: a mathematical software package for locating and evaluating roots of systems of nonlinear equations , 1988, TOMS.

[18]  R. D. Murphy,et al.  Iterative solution of nonlinear equations , 1994 .

[19]  J. Gillis,et al.  Matrix Iterative Analysis , 1961 .

[20]  Masayoshi Tomizuka,et al.  Modeling and conventional/adaptive PI control of a Lathe cutting process , 1988 .

[21]  Michael N. Vrahatis,et al.  Solving systems of nonlinear equations using the nonzero value of the topological degree , 1988, TOMS.

[22]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[23]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.