smt- SPRINTS: Software Precomputation with Intelligent Streaming for Resource-Constrained SMTs

We present SPRINTS, a source-level speculative precomputation framework for scientific applications running on SMTs with two execution contexts. Our framework targets memory-bound applications and reduces memory latency by prefetching long streams of delinquent data accesses. A unique aspect of SPRINTS is that it requires neither hardware nor compiler support. It is based on partial cache simulation and a compression algorithm which can accurately summarize very long streams of cache misses. SPRINTS extracts patterns from the streams, which are in turn used to generate source-level, highly optimized precomputation code. SPRINTS achieves significant performance improvements over plain thread-level parallelization and indiscriminate precomputation based on code cloning. We demonstrate these improvements using two realistic scientific applications.

[1]  Gurindar S. Sohi,et al.  A Quantitative Framework for Pre-Exe-cution Thread Selection , 2002 .

[2]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[3]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[4]  Donald Yeung,et al.  A study of source-level compiler algorithms for automatic construction of pre-execution code , 2004, TOCS.

[5]  Trishul M. Chilimbi Efficient representations and abstractions for quantifying and exploiting data reference locality , 2001, PLDI '01.

[6]  Microsystems Sun UltraSPARC IV Processor Architecture Overview , 2004 .

[7]  Stephane Eranian,et al.  The perfmon2 interface specification , 2005 .

[8]  John Paul Shen,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[9]  Martin Hirzel,et al.  Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.

[10]  H. Jin,et al.  - 3-The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance , 1999 .

[11]  Gurindar S. Sohi,et al.  A quantitative framework for automated pre-execution thread selection , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[12]  John Paul Shen,et al.  Post-pass binary adaptation for software-based speculative precomputation , 2002, PLDI '02.

[13]  K. Sundaramoorthy,et al.  Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.

[14]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[15]  Dimitrios S. Nikolopoulos,et al.  Runtime support for integrating precomputation and thread-level parallelism on simultaneous multithreaded processors , 2004 .

[16]  Eric Rotenberg,et al.  Slipstream processors: improving both performance and fault tolerance , 2000, SIGP.

[17]  John Paul Shen,et al.  Dynamic speculative precomputation , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.