The generalized proportional-integral-derivative (PID) gradient descent back propagation algorithm

Abstract The back-propagation learning rule is modified by using the classical gradient descent algorithm (which uses only a proportional term) with integral and derivative terms of the gradient. The effect of these terms on the convergence behaviour of the objective function is studied and compared with MOM (momentum equation). It is observed that, with an appropriate tuning of the proportional-integral-derivative (PID) parameters, the rate of convergence is greatly improved and the local minima can be overcome. The integral action also helps in locating a minimum quickly. A guideline is presented to appropriately tune the PID parameters and an “integral suppression scheme” is proposed that effectively uses the PID principles, resulting in faster convergence at a desired minimum.

[1]  Tariq Samad,et al.  Back propagation with expected source values , 1991, Neural Networks.

[2]  Geoffrey E. Hinton,et al.  Proceedings of the 1988 Connectionist Models Summer School , 1989 .

[3]  Thomas P. Vogl,et al.  Rescaling of variables in back propagation learning , 1991, Neural Networks.

[4]  Yann LeCun,et al.  Improving the convergence of back-propagation learning with second-order methods , 1989 .

[5]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[6]  Harry A. C. Eaton,et al.  Learning coefficient dependence on training set size , 1992, Neural Networks.

[7]  Randy L. Shimabukuro,et al.  Back propagation learning with trinary quantization of weight updates , 1991, Neural Networks.

[8]  George Stephanopoulos,et al.  Chemical Process Control: An Introduction to Theory and Practice , 1983 .

[9]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[10]  Michael K. Weir,et al.  A method for self-determination of adaptive learning rates in back propagation , 1991, Neural Networks.

[11]  J. G. Ziegler,et al.  Optimum Settings for Automatic Controllers , 1942, Journal of Fluids Engineering.

[12]  Arjen van Ooyen,et al.  Improving the convergence of the back-propagation algorithm , 1992, Neural Networks.

[13]  Tom Tollenaere,et al.  SuperSAB: Fast adaptive back propagation with good scaling properties , 1990, Neural Networks.

[14]  D.R. Hush,et al.  Progress in supervised neural networks , 1993, IEEE Signal Processing Magazine.