Exploiting TLS Parallelism at Multiple Loop-Nest Levels

As the number of cores integrated onto a single chip increases, architecture and compiler designers are challenged with the difficulty of utilizing these cores to improve the performance of a single application. Thread-level speculation (TLS) can potentially help by allowing possibly dependent threads to speculatively execute in parallel. Extracting speculative thread from sequential applications is key to efficient TLS execution. Previous work on thread extraction has focused on parallelizing iterations from a single loop-nest level or function continuation. However, the amount of parallelism available at a single loop-nest level is sometimes limited, and we are forced to look for parallelism across multiple loop-nest levels. In this paper we propose SpecOPTAL---a compiler algorithm that statically allocates cores to threads extracted from different levels of loop-nests. We show that, a subset of SPEC 2006 benchmarks are able to benefit from the proposed technique.

[1]  Wei Liu,et al.  Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.

[2]  Antonia Zhai,et al.  Supporting Speculative Multithreading on Simultaneous Multithreaded Processors , 2006, HiPC.

[3]  Antonia Zhai,et al.  Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.

[4]  Jian Huang,et al.  The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.

[5]  Per Stenström,et al.  Limits on speculative module-level parallelism in imperative and object-oriented programs on CMP platforms , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[6]  Antonia Zhai,et al.  Compiler techniques for thread-level speculation , 2007 .

[7]  David A. Padua,et al.  Utilizing Multidimensional Loop Parallelism on Large-Scale Parallel Processor Systems , 1989, IEEE Trans. Computers.

[8]  Antonia Zhai,et al.  Loop Selection for Thread-Level Speculation , 2005, LCPC.

[9]  Monica S. Lam,et al.  In search of speculative thread-level parallelism , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[10]  Brad Calder,et al.  Detecting phases in parallel applications on shared memory architectures , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[11]  Antonia Zhai,et al.  Compiler optimization of memory-resident value communication between speculative threads , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[12]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[13]  Lin Gao,et al.  Compiler techniques for thread-level speculation , 2009 .

[14]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[15]  Antonia Zhai,et al.  Dynamic performance tuning for speculative threads , 2009, ISCA '09.

[16]  Chen Yang,et al.  A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.

[17]  Jin Lin,et al.  Data Dependence Profiling for Speculative Optimizations , 2004, CC.

[18]  Antonia Zhai,et al.  The STAMPede approach to thread-level speculation , 2005, TOCS.