Warp: A Programmable Systolic Array Processor

CMU is currently building a programmable, 32-bit floating-point systolic array processor using only off-the-shelf components. The 10-cell processor, with one cell implemented on one board, can process 1024-point complex FFI's at a rate of one FFT every 600 μs. Under program control, the same processor can perform many other primitive computations in signal, image, and vision processing, including two-dimensional convolution, dynamic programming, and real or complex matrix multiplication, at a rate of 100 million floating-point operations per second. This particular systolic array processor is called the Warp, suggesting that it can perform a variety of transformations at a very high speed. For a mobile robot demonstration planned in 1985, the Warp is expected to speed up the navigation process by at least one order of magnitude. The Warp has a relatively simple architecture given its performance. The processor is a linear array of cells (or processing elements) that in general takes inputs from one end. and produces outputs at the other end. The processor can efficiently implement many systolic algorithms where communication between adjacent cells is intensive. The processor can also efficiently implement many non-systolic algorithms where each cell operates on its own data independently from the rest. This paper describes the structure of the Warp.