TRIPS: A polymorphous architecture for exploiting ILP, TLP, and DLP

This paper describes the polymorphous TRIPS architecture that can be configured for different granularities and types of parallelism. The TRIPS architecture is the first in a class of post-RISC, dataflow-like instruction sets called explicit data-graph execution (EDGE). This EDGE ISA is coupled with hardware mechanisms that enable the processing cores and the on-chip memory system to be configured and combined in different modes for instruction, data, or thread-level parallelism. To adapt to small and large-grain concurrency, the TRIPS architecture prototype contains two out-of-order, 16-wide-issue grid processor cores, which can be partitioned when easily extractable fine-grained parallelism exists. This approach to polymorphism provides better performance across a wide range of application types than an approach in which many small processors are aggregated to run workloads with irregular parallelism. Our results show that high performance can be obtained in each of the three modes---ILP, TLP, and DLP---demonstrating the viability of the polymorphous coarse-grained approach for future microprocessors.

[1]  Seth Copen Goldstein,et al.  PipeRench: A Reconfigurable Architecture and Compiler , 2000, Computer.

[2]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[3]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  Lizy Kurian John,et al.  Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements , 2003, IEEE Trans. Computers.

[5]  B. R. Rau,et al.  HPL-PD Architecture Specification:Version 1.1 , 2000 .

[6]  Josep Torrellas,et al.  Architectural support for scalable speculative parallelization in shared-memory multiprocessors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Carl Ebeling,et al.  Configurable computing: the catalyst for high-performance architectures , 1997, Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors.

[8]  Karthikeyan Sankaralingam,et al.  Universal Mechanisms for Data-Parallel Architectures , 2003, MICRO.

[9]  Seung-Moon Yoo,et al.  FlexRAM: toward an advanced intelligent memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[10]  Yale N. Patt,et al.  Performance benefits of large execution atomic units in dynamically scheduled machines , 1989, ICS '89.

[11]  Matthew Mattina,et al.  Tarantula: a vector extension to the alpha architecture , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[12]  Scott A. Mahlke,et al.  IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, ISCA '91.

[13]  Manoj Franklin,et al.  The PEWs microarchitecture: reducing complexity through data-dependence based decentralization , 1998, Microprocess. Microsystems.

[14]  William J. Dally,et al.  A bandwidth-efficient architecture for media processing , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[15]  M. Oskin,et al.  Active Pages: a computation model for intelligent memory , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[16]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[17]  Quinn Jacobson,et al.  Control flow speculation in multiscalar processors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[18]  Kaushik Roy,et al.  VSV: L2-Miss-Driven Variable Supply-Voltage Scaling for Low Power , 2003, MICRO.

[19]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[20]  William J. Dally,et al.  Imagine: Media Processing with Streams , 2001, IEEE Micro.

[21]  BurgerDoug,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002 .

[22]  Michael L. Scott,et al.  Dynamic frequency and voltage control for a multiple clock domain microarchitecture , 2002, MICRO.

[23]  Hai Li,et al.  VSV: L2-miss-driven variable supply-voltage scaling for low power , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..

[24]  William J. Dally,et al.  Smart Memories: a modular reconfigurable architecture , 2000, ISCA '00.

[25]  R. E. Kessler,et al.  The Alpha 21264 Microprocessor: Out-of-Order Execution at 600 Mhz , 1998 .

[26]  José E. Moreira,et al.  Evaluation of a multithreaded architecture for cellular computing , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[27]  R. Nagarajan,et al.  A design space evaluation of grid processor architectures , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[28]  Thomas L. Sterling The Gilgamesh MIND Processor-in-Memory Architecture for Petaflops-Scale Computing , 2002, ISHPC.

[29]  Wen-mei W. Hwu,et al.  IMPACT: an architectural framework for multiple-instruction-issue processors , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[30]  Microsystems Sun,et al.  Jini^ Architecture Specification Version 2.0 , 2003 .

[31]  Ho-Seop Kim,et al.  An instruction set and microarchitecture for instruction level distributed processing , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[32]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[33]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[34]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[35]  Markus Weinhardt,et al.  PACT XPP—A Self-Reconfigurable Data Processing Architecture , 2003, The Journal of Supercomputing.

[36]  Luiz André Barroso,et al.  Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[37]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[38]  Calvin Lin,et al.  Combining Hyperblocks and Exit Prediction to Increase Front-End Bandwidth and Performance , 2002 .

[39]  B. Ramakrishna Rau,et al.  Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.

[40]  Vivek Sarkar,et al.  Baring It All to Software: Raw Machines , 1997, Computer.

[41]  Simha Sethumadhavan,et al.  Scalable Hardware Memory Disambiguation for High-ILP Processors , 2004, IEEE Micro.