The Ring Array Processor: A Multiprocessing Peripheral for Connection Applications

Abstract We have designed and implemented a Ring Array Processor (RAP) for fast implementation of our continuous speech recognition training algorithms, which are currently dominated by layered “neural” network calculations. The RAP is a multi-DSP system with a low-latency ring interconnection scheme using programmable gate array technology and a significant amount of local memory per node (4–16 Mbytes of dynamic memory and 256 Kbytes of fast static RAM). Theoretical peak performance is 128 MFLOPS/board. A working system with 20 nodes has been used for our research at rates of 200–300 million connections per second for probability evaluation, and at roughly 30–60 million connection updates per second for training. A fully functional system with 40 nodes has also been benchmarked at roughly twice these rates. While practical considerations such as workstation address space restrict current implementations to 64 nodes, the architecture scales to about 16,000 nodes. For problems with 2 units per processor, communication and control overhead would reduce peak performance on the error back-propagation algorithm to about 50% of a linear speedup. This report describes the motivation for the RAP and shows how the architecture matches the target algorithm. We further describe some of the key features of the hardware and software design.

[1]  Hervé Bourlard,et al.  Continuous speech recognition on the resource management database using connectionist probability estimation , 1990, ICSLP.

[2]  D. Hammerstrom,et al.  A VLSI architecture for high-performance, low-cost, on-chip learning , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[3]  A. Iwata,et al.  An artificial neural network accelerator using general purpose 24 bit floating point digital signal processors , 1989, International 1989 Joint Conference on Neural Networks.

[4]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[5]  Hervé Bourlard,et al.  Statistical Inference in Multilayer Perceptrons and Hidden Markov Models with Applications in Continuous Speech Recognition , 1989, NATO Neurocomputing.

[6]  Jill P. Mesirov,et al.  An Efficient Implementation of the Back-propagation Algorithm on the Connection Machine CM-2 , 1989, NIPS.

[7]  Hervé Bourlard,et al.  Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.

[8]  W. Raab,et al.  Fine-grain system architectures for systolic emulation of neural algorithms , 1990, [1990] Proceedings of the International Conference on Application Specific Array Processors.

[9]  Hervé Bourlard,et al.  Continuous speech recognition using multilayer perceptrons with hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Jan M. Rabaey,et al.  A large-vocabulary real-time continuous-speech recognition system , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  H. Bourlard,et al.  Links Between Markov Models and Multilayer Perceptrons , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  R. Bisiani,et al.  BEAM. An accelerator for speech recognition , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[14]  Jeff A. Bilmes,et al.  The RAP: a ring array processor for layered network calculations , 1990, [1990] Proceedings of the International Conference on Application Specific Array Processors.

[15]  Yann LeCun,et al.  Optimal Brain Damage , 1989, NIPS.

[16]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[17]  Jenq-Neng Hwang,et al.  A Unified Systolic Architecture for Artificial Neural Networks , 1989, J. Parallel Distributed Comput..

[18]  Mark A. Fanty,et al.  Computing with structured connectionist networks , 1988, CACM.

[19]  H. T. Kung,et al.  Using warp as a supercomputer in signal processing , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  D. S. Touretzky,et al.  Neural network simulation at Warp speed: how we got 17 million connections per second , 1988, IEEE 1988 International Conference on Neural Networks.

[21]  Hynek Hermansky,et al.  Continuous speech recognition using PLP analysis with multilayer perceptrons , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  Robert W. Brodersen,et al.  An integrated-circuit-based speech recognition system , 1986, IEEE Trans. Acoust. Speech Signal Process..