Fast parallel off-line training of multilayer perceptrons

Various approaches to the parallel implementation of second-order gradient-based multilayer perceptron training algorithms are described. Two main classes of algorithm are defined involving Hessian and conjugate gradient-based methods. The limited- and full-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithms are selected as representative examples and used to show that the step size and gradient calculations are critical components. For larger problems the matrix calculations in the full-memory algorithm are also significant. Various strategies are considered for parallelization, the best of which is implemented on parallel virtual machine (PVM) and transputer-based architectures. Results from a range of problems are used to demonstrate the performance achievable with each architecture. The transputer implementation is found to give excellent speed-ups but the problem size is limited by memory constraints. The speed-ups achievable with the PVM implementation are much poorer because of inefficient communication, but memory is not a difficulty.

[1]  Peter J. Gawthrop,et al.  Neural networks for control systems - A survey , 1992, Autom..

[2]  Scott E. Fahlman,et al.  An empirical study of learning speed in back-propagation networks , 1988 .

[3]  Martin Brown,et al.  Comparative Aspects of Neural Network Algorithms for On-Line Modelling of Dynamic Processes , 1993 .

[4]  Martin Fodslette Møller,et al.  A scaled conjugate gradient algorithm for fast supervised learning , 1993, Neural Networks.

[5]  George W. Irwin,et al.  Online neural control applied to a bank-to-turn missile autopilot , 1995 .

[6]  Robert A. Jacobs,et al.  Increased rates of convergence through learning rate adaptation , 1987, Neural Networks.

[7]  Dale E. Seborg,et al.  Nonlinear internal model control strategy for neural network models , 1992 .

[8]  Martin Fodslette Møller,et al.  Supervised Learning On Large Redundant Training Sets , 1993, Int. J. Neural Syst..

[9]  George W. Irwin,et al.  Parallel Off-line Training of Multilayer Perceptrons , 1995 .

[10]  George W. Irwin,et al.  Efficient Step-Size Evaluation in Gradient Based Neural Network Training Algorithms , 1995 .

[11]  George W. Irwin,et al.  A Parallel Algorithm for Training Neural Network Based Nonlinear Models , 1992 .

[12]  Wolfram Schiffmann,et al.  Optimization of the Backpropagation Algorithm for Training Multilayer Perceptrons , 1994 .

[13]  Martin Brown,et al.  Intelligent Control - Aspects of Fuzzy Logic and Neural Nets , 1993, World Scientific Series in Robotics and Intelligent Systems.

[14]  George W. Irwin,et al.  Insights into Multilayer Perceptrons and their Training , 1994 .

[15]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[16]  P. S. Shcherbakov,et al.  Learning in neural networks and stochastic approximation methods with averaging , 1994 .

[17]  Roberto Battiti,et al.  Accelerated Backpropagation Learning: Two Optimization Methods , 1989, Complex Syst..

[18]  Panos J. Antsaklis,et al.  Neural networks for control systems , 1990, IEEE Trans. Neural Networks.

[19]  George W. Irwin,et al.  Neural network modelling of a 200 MW boiler system , 1995 .

[20]  George W. Irwin,et al.  Neural networks for control and systems , 1992 .

[21]  S. W. Piche,et al.  Steepest descent algorithms for neural network controllers and filters , 1994, IEEE Trans. Neural Networks.

[22]  Philip E. Gill,et al.  Practical optimization , 1981 .

[23]  George W. Irwin,et al.  Fast Gradient Based Off-Line Training of Multilayer Perceptrons , 1995 .

[24]  R. Fletcher Practical Methods of Optimization , 1988 .