Software data-triggered threads

The data-triggered threads (DTT) programming and execution model can increase parallelism and eliminate redundant computation. However, the initial proposal requires significant architecture support, which impedes existing applications and architectures from taking advantage of this model. This work proposes a pure software solution that supports the DTT model without any hardware support. This research uses a prototype compiler and runtime libraries running on top of existing machines. Several enhancements to the initial software implementation are presented, which further improve the performance. The software runtime system improves the performance of serial C SPEC benchmarks by 15% on a Nehalem processor, but by over 7X over the full suite of single-thread applications. It is shown that the DTT model can work in conjunction with traditional parallelism. The DTT model provides up to 64X speedup over parallel applications exploiting traditional parallelism.

[1]  Antonio González,et al.  Speculative multithreaded processors , 1998, ICS '98.

[2]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[3]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4]  Christian Bienia,et al.  PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors , 2009 .

[5]  Larry Rudolph,et al.  Accelerating multi-media processing by implementing memoing in multiplication and division units , 1998, ASPLOS VIII.

[6]  Umut A. Acar,et al.  CEAL: a C-based language for self-adjusting computation , 2009, PLDI '09.

[7]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[8]  Dean M. Tullsen,et al.  Eliminating Redundant Computation and Exposing Parallelism through Data-Triggered Threads , 2012, IEEE Micro.

[9]  Dean M. Tullsen,et al.  Fast thread migration via cache working set prediction , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[10]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[11]  Gurindar S. Sohi,et al.  Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[12]  R. S. Nikhil Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[13]  John Feo,et al.  SISAL reference manual. Language version 2.0 , 1990 .

[14]  D. Michie “Memo” Functions and Machine Learning , 1968, Nature.

[15]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[16]  John Feo,et al.  SISAL reference manual , 1990 .

[17]  Rajiv Kapoor,et al.  Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[18]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.