Performance Evaluation of Dynamic Speculative Multithreading with the Cascadia Architecture

Thread-level parallelism (TLP) has been extensively studied in order to overcome the limitations of exploiting instruction-level parallelism (ILP) on high-performance superscalar processors. One promising method of exploiting TLP is dynamic speculative multithreading (D-SpMT), which extracts multiple threads from a sequential program without compiler support or instruction set extensions. This paper introduces Cascadia, a D-SpMT multicore architecture that provides multigrain thread-level support and is used to evaluate the performance of several benchmarks. Cascadia applies a unique sustainable IPC (sIPC) metric on a comprehensive loop tree to select the best performing nested loop level to multithread. This paper also discusses the relationships that loops have on one another, in particular, how loop nesting levels can be extended through procedures. In addition, a detailed study is provided on the effects that thread granularity and interthread dependencies have on the entire system.

[1]  Chen Yang,et al.  A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.

[2]  Gurindar S. Sohi,et al.  Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[3]  John Paul Shen,et al.  Multiple Instruction Stream Processor , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[4]  Guilherme Ottoni,et al.  From sequential programs to concurrent threads , 2006, IEEE Computer Architecture Letters.

[5]  David R. Kaeli,et al.  Characterization and Evaluation of Hardware Loop Unrolling , 2002 .

[6]  Ben Lee,et al.  NetSin: An Object-Oriented Architectural Simulator Suite , 2005, CDES.

[7]  Monica S. Lam,et al.  In search of speculative thread-level parallelism , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[8]  Todd M. Austin,et al.  SimpleScalar: An Infrastructure for Computer System Modeling , 2002, Computer.

[9]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[10]  Wei-Chung Hsu,et al.  Dynamic helper threaded prefetching on the Sun UltraSPARC/spl reg/ CMP processor , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[11]  Guilherme Ottoni,et al.  Global Multi-Threaded Instruction Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[12]  Kunle Olukotun,et al.  The common case transactional behavior of multithreaded programs , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[13]  Rajiv Gupta,et al.  Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[14]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[15]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[16]  Milind Girkar,et al.  On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings , 2006, ICS '06.

[17]  Babak Falsafi,et al.  Implicitly-multithreaded processors , 2003, ISCA '03.

[18]  Marc Tremblay,et al.  A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[19]  Jaume Abella,et al.  Penelope: The NBTI-Aware Processor , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[20]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[21]  Antonia Zhai,et al.  Loop Selection for Thread-Level Speculation , 2005, LCPC.

[22]  Satoshi Matsushita,et al.  Pinot: speculative multi-threading processor architecture exploiting parallelism over a wide range of granularities , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[23]  Ben Lee,et al.  Dynamic Simultaneous Multithreaded Architecture , 2003, ISCA PDCS.

[24]  Yun Zhang,et al.  Revisiting the Sequential Programming Model for Multi-Core , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[25]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[26]  Antonio González,et al.  Thread-spawning schemes for speculative multithreading , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[27]  Dean M. Tullsen,et al.  Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.

[28]  Avi Mendelson,et al.  Speculative synchronization and thread management for fine granularity threads , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[29]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[30]  Makoto Kobayashi Dynamic Characteristics of Loops , 1984, IEEE Transactions on Computers.

[31]  Gurindar S. Sohi,et al.  Speculative Multithreaded Processors , 2001, Computer.

[32]  Dean M. Tullsen,et al.  Maximizing TLP with loop-parallelization on SMT , 2001 .

[33]  Jenn-Yuan Tsai,et al.  The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[34]  Antonio González,et al.  Clustered speculative multithreaded processors , 1999, ICS '99.

[35]  A. J. KleinOsowski,et al.  MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research , 2002, IEEE Computer Architecture Letters.

[36]  Marc Tremblay,et al.  A three dimensional register file for superscalar processors , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[37]  Antonio González,et al.  Control speculation in multithreaded processors through dynamic loop detection , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[38]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[39]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[40]  Dean M. Tullsen,et al.  Control Flow Optimization Via Dynamic Reconvergence Prediction , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[41]  John Paul Shen,et al.  Mitosis: A Speculative Multithreaded Processor Based on Precomputation Slices , 2008, IEEE Transactions on Parallel and Distributed Systems.