A System for Evaluating Performance and Cost of SIMD Array Designs

SIMD arrays are likely to become increasingly important as coprocessors in domain specific systems as architects continue to leverage RAM technology in their design. The problem this work addresses is the efficient evaluation of SIMD arrays with respect to complex applications while accounting for operating frequency and chip area. The underlying issues include the size of the architecture space, the lack of portability of the test programs, and the inherent complexity of simulating up to hundreds of thousands of processing elements. The overall method we use is to combine architecture level and Electronic Design Automation (EDA) level modeling by using an EDA-based tool to calibrate architectural simulations. The resulting system retains much of the high throughput of the architecture level simulator but it also has accuracy similar to that of an early pass EDA synthesis and circuit simulation. The particular problem of computational cost of the architectural level simulation is addressed with a novel approach to trace-based simulation (we call it trace compilation), which we find to be one to two orders of magnitude faster than instruction level simulation while still retaining much of the accuracy of the model. Furthermore, traces must be generated for only a small fraction of the possible parameter combinations. Using trace compilation also addresses program portability by allowing the user to code in a single data parallel language with a single compiler, regardless of the target architecture. We have used our system to evaluate thousands of potential SIMD array designs with respect to real applications and present some sample results.

[1]  Charles Sodini,et al.  System design for pixel-parallel image processing , 1996, IEEE Trans. Very Large Scale Integr. Syst..

[2]  Paolo Gargini,et al.  The SIA's 1997 National Technology Roadmap for Semiconductors : SIA roadmap preview , 1998 .

[3]  Shin'ichiro Okazaki,et al.  A 64 parallel integrated memory array processor and a 30 GIPS real-time vision system , 1995, Proceedings of Conference on Computer Architectures for Machine Perception.

[4]  Péter Kacsuk,et al.  Advanced computer architectures - a design space approach , 1997, International computer science series.

[5]  Martin C. Herbordt,et al.  An empirical study of datapath, memory hierarchy, and network in SIMD array architectures , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.

[6]  Martin C. Herbordt,et al.  Practical Algorithms for Online Routing on Fixed and Reconfigurable Meshes , 1994, J. Parallel Distributed Comput..

[7]  Duncan G. Elliott,et al.  Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[8]  P. DeMone,et al.  A 33 GB/s 13.4 Mb integrated graphics accelerator and frame buffer , 1998, 1998 IEEE International Solid-State Circuits Conference. Digest of Technical Papers, ISSCC. First Edition (Cat. No.98CH36156).

[9]  Takeshi Ikenaga,et al.  CAM2: A Highly-Parallel Two-Dimensional Cellular Architecture , 1998, IEEE Trans. Computers.

[10]  Martin C. Herbordt,et al.  Making a dataparallel language portable for massively parallel array computers , 1997, Proceedings Fourth IEEE International Workshop on Computer Architecture for Machine Perception. CAMP'97.

[11]  Young Sik Kim,et al.  Memory based processor array for artificial neural networks , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[12]  Y. Fujita,et al.  A 10 GIPS SIMD processor for PC-based real time vision applications -architecture, algorithm implementation and language support , 1997, Proceedings Fourth IEEE International Workshop on Computer Architecture for Machine Perception. CAMP'97.

[13]  Neil A. Thacker,et al.  An array processor for general purpose digital image compression , 1995, IEEE J. Solid State Circuits.

[14]  Bruce H. McCormick,et al.  The Illinois Pattern Recognition Computer-ILLIAC III , 1963, IEEE Trans. Electron. Comput..

[15]  Paul N. Swarztrauber,et al.  Transposing Arrays on Multicomputers Using de Bruijn Sequences , 1998, J. Parallel Distributed Comput..

[16]  Duncan G. Elliott,et al.  Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.

[17]  B. Parhami,et al.  Content addressable parallel processors , 1978, Proceedings of the IEEE.

[18]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[19]  Masatoshi Ishikawa,et al.  Vision chip architecture using general-purpose processing elements for 1 ms vision system , 1997, Proceedings Fourth IEEE International Workshop on Computer Architecture for Machine Perception. CAMP'97.

[20]  Changhee Lee,et al.  A general purpose SliM-II image processor , 1997, Proceedings Fourth IEEE International Workshop on Computer Architecture for Machine Perception. CAMP'97.

[21]  Martin C. Herbordt,et al.  Design trade-offs of low-cost multicomputer network switches , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[22]  Michael Bolotski,et al.  Abacus - a reconfigurable bit-parallel architecture for early vision , 1996 .

[23]  S. Levialdi,et al.  Languages and architectures for image processing , 1981 .

[24]  David E. Schimmel,et al.  Issues in the Design of High Performance SIMD Architectures , 1996, IEEE Trans. Parallel Distributed Syst..

[25]  Azriel Rosenfeld,et al.  The DARPA Image Understanding Benchmark for Parallel Computers , 1990, J. Parallel Distributed Comput..

[26]  Tom Blank,et al.  The MasPar MP-1 architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[27]  Massimo Maresca,et al.  Polymorphic-Torus Network , 1989, IEEE Trans. Computers.

[28]  Martin C. Herbordt,et al.  Preprototyping SIMD coprocessors using virtual machine emulation and trace compilation , 1997, SIGMETRICS '97.

[29]  Lan-Rong Dung,et al.  Conceptual Prototyping of Scalable Embedded DSP Systems , 1996, IEEE Des. Test Comput..

[30]  V. K. Prasanna Kumar,et al.  Parallel architectures and algorithms for image understanding , 1991 .

[31]  Martin C. Herbordt,et al.  The evaluation of massively parallel array architectures , 1995 .

[32]  Mary Jane Irwin,et al.  A Two-Dimensional, Distributed Logic Architecture , 1991, IEEE Trans. Computers.

[33]  Martin C. Herbordt,et al.  Experimental Analysis of Some SIMD Array Memory Hierarchies , 1995, ICPP.

[34]  James J. Little,et al.  Algorithmic Techniques for Computer Vision on a Fine-Grained Parallel Machine , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Martin C. Herbordt,et al.  Processor/memory/array size tradeoffs in the design of SIMD arrays for a spatially mapped workload , 1997, Proceedings Fourth IEEE International Workshop on Computer Architecture for Machine Perception. CAMP'97.

[36]  Dionysios I. Reisis,et al.  Parallel Computations on Reconfigurable Meshes , 1993, IEEE Trans. Computers.

[37]  Weems,et al.  Image processing on a Content Addressable Array Parallel Processor , 1984 .

[38]  Takeshi Ikenaga,et al.  CAM2 : A highly-parallel two-dimensional cellular automaton architecture , 1998 .