Parallel neural net training on the KSR1

In modern day pattern recognition, neural nets are used extensively. General use of a feedforward neural net consists of a training phase followed by a classification phase. Classification of an unknown test vector is very fast and only consists of the propagation of the test vector through the neural net. Training involves an optimization procedure and is very time-consuming since a feasible local minimum is sought in high-dimensional weight space. In this paper we present an analysis of a parallel implementation of the backpropagation training algorithm using conjugate gradient optimization for a three-layered, feedforward neural network, on the KSR1 parallel shared-memory machine. We implement two parallel neural net training versions on the KSR1, one using native code, the other using P4, a library of macros and functions. A speedup model is presented which we use to clarify our experimental results. We identify the general requirements which render the parallel implementation useful, compared to the sequential execution of the same neural net training procedure. We determine the usefulness of a library of functions (such as P4) developed to ease the task of the programmer. Using experimental results we further identify the limits in processor utilization for our parallel training algorithm.