The Warp Computer: Architecture, Implementation, and Performance

The Warp machine is a systolic array computer of linearly connected cells, each of which is a programmable processor capable of performing 10 million floating-point operations per second (10 MFLOPS). A typical Warp array includes ten cells, thus having a peak computation rate of 100 MFLOPS. The Warp array can be extended to include more cells to accommodate applications capable of using the increased computational bandwidth. Warp is integrated as an attached processor into a Unix host system. Programs for Warp are written in a high-level language supported by an optimizing compiler. The first ten-cell prototype was completed in February 1986; delivery of production machines started in April 1987. Extensive experimentation with both the prototype and production machines has demonstrated that the Warp architecture is effective in the application domain of robot navigation as well as in other fields such as signal processing, scientific computation, and computer vision research. For these applications, Warp is typically several hundred times faster than a VAX 11/780 class computer. This paper describes the architecture, implementation, and performance of the Warp machine. Each major architectural decision is discussed and evaluated with system, software, and application considerations. The programming model and tools developed for the machine are also described. The paper concludes with performance data for a large number of applications.

[1]  B. R. Rau,et al.  A Statically Scheduled VLSI Interconnect for Parallel Processors , 1981 .

[2]  H. T. Kung,et al.  Warp architecture: From prototype to production , 1899 .

[3]  Alan E. Charlesworth,et al.  An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family , 1981, Computer.

[4]  David L. Waltz,et al.  Applications of the Connection Machine , 1990, Computer.

[5]  Takeo Kanade,et al.  First Results in Robot Road-Following , 1985, IJCAI.

[6]  H. T. Kung Systolic algorithms for the CMU warp processor , 1984 .

[7]  J. A. Webb,et al.  Low-level vision on Warp and the apply programming model. Technical report , 1987 .

[8]  Hideo Aiso,et al.  Proceedings of the 16th annual international symposium on Computer architecture , 1986 .

[9]  Robert M. Haralick,et al.  Digital Step Edges from Zero Crossing of Second Directional Derivatives , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Monica S. Lam,et al.  The Warp programming environment , 1899 .

[11]  H. T. Kung Memory requirements for balanced computer architectures , 1986, ISCA '86.

[12]  H. T. Kung Why systolic architectures? , 1982, Computer.

[13]  H. T. Kung,et al.  Experience With The CMU Programmable Systolic Chip , 1984, Optics & Photonics.

[14]  Bob Woo,et al.  A high-speech 32 bit IEEE floating-point chip set for digital signal processing , 1984, ICASSP.

[15]  H. T. Kung,et al.  Using warp as a supercomputer in signal processing , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  J. A. Webb,et al.  End-of-year report for parallel-vision algorithm design and implementation. Technical report, 15 January 1986-14 January 1987 , 1987 .

[17]  H. T. Kung,et al.  Warp as a machine for low-level vision , 1985, Proceedings. 1985 IEEE International Conference on Robotics and Automation.

[18]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[19]  Gudrun Klinker,et al.  Implementation and performance of a complex vision system on a systolic array machine , 1988, Future Gener. Comput. Syst..

[20]  Marco Annaratone,et al.  Applications experience on Warp , 1899 .

[21]  Louis A. Hageman,et al.  Iterative Solution of Large Linear Systems. , 1971 .

[22]  H. T. Kung Memory requirements for balanced computer architectures , 1986, ISCA '86.

[23]  H. T. Kung,et al.  Applications and Algorithm Partitioning on Warp , 1987, COMPCON.

[24]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[25]  Takeo Kanade,et al.  Progress in robot road-following , 1986, Proceedings. 1986 IEEE International Conference on Robotics and Automation.

[26]  James J. Little,et al.  Parallel Algorithms for Computer Vision on the Connection Machine , 1986 .

[27]  Kenneth E. Batcher,et al.  Design of a Massively Parallel Processor , 1980, IEEE Transactions on Computers.

[28]  Thomas R. Gross,et al.  Compilation for a high-performance systolic array , 1986, SIGPLAN '86.

[29]  H. T. Kung,et al.  Global operations on the CMU Warp machine , 1985 .

[30]  Monica Sin-Ling Lam,et al.  A Systolic Array Optimizing Compiler , 1989 .

[31]  H. T. Kung,et al.  Architecture of Warp , 1987, COMPCON.