Benchmarking of the CM-5 and the Cray machines with a very large backpropagation neural network

In this paper, we present a new, efficient implementation of the backpropagation algorithm (BP) on the CM-5 by fully taking advantage of its Control Network to avoid explicit message-passing. The nodes in the input and output layers are evenly distributed to all processors: all nodes in the hidden layer(s) are replicated in each processor, and all weights are distributed to all processors corresponding to the nodes. We have implemented this algorithm on the CM-5 in the MIMD mode using the C programming language. For a case study of protein tertiary structure prediction, we obtained performance of 76 million weight updates per second (WUPS) with the machine partitioned for 512 processors without vector units. Experiments using different sized partitions indicated an almost linear relationship between the computation time and the number of processors, indicating good parallelization. We have also implemented the backpropagation algorithm on the Cray machines using the C programming language. The Cray-2 implementation yields performance of 10 million WUPS; the Cray X-MP EA implementation yields 18 million WUPS; and the Cray Y-MP M92 implementation yields 40 million WUPS.<<ETX>>