Loop Selection for Thread-Level Speculation

Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. However, the high cost associated with unbalanced load, failed speculation, and inter-thread value communication makes it difficult to obtain the desired performance unless the speculative threads are carefully chosen. In this paper, we focus on extracting parallel threads from loops in general-purpose applications because loops, with their regular structures and significant coverage on execution time, are ideal candidates for extracting parallel threads. General-purpose applications, however, usually contain a large number of nested loops with unpredictable parallel performance and dynamic behavior, thus making it difficult to decide which set of loops should be parallelized to improve overall program performance. Our proposed loop selection algorithm addresses all these difficulties. We have found that (i) with the aid of profiling information, compiler analyses can achieve a reasonably accurate estimation of the performance of parallel execution, and that (ii) different invocations of a loop may behave differently, and exploiting this dynamic behavior can further improve performance. With a judicious choice of loops, we can improve the overall program performance of SPEC2000 integer benchmarks by as much as 20%.

[1]  Antonia Zhai,et al.  Compiler optimization of memory-resident value communication between speculative threads , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Jin Lin,et al.  Data Dependence Profiling for Speculative Optimizations , 2004, CC.

[3]  Manish Gupta,et al.  Techniques for Speculative Run-Time Parallelization of Loops , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[4]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[5]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[6]  Gurindar S. Sohi,et al.  Compiling for the multiscalar architecture , 1998 .

[7]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[8]  Antonio González,et al.  Value prediction for speculative multithreaded architectures , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  Antonio González,et al.  Speculative multithreaded processors , 1998, ICS '98.

[10]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[11]  Anant Agarwal,et al.  SUDS: Primitive Mechanisms for Memory Dependence Speculation , 1999 .

[12]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[13]  Kunle Olukotun,et al.  The Jrpm system for dynamically parallelizing Java programs , 2003, ISCA '03.

[14]  Antonia Zhai,et al.  Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.

[15]  Jian Huang,et al.  The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.

[16]  Antonio González,et al.  Clustered speculative multithreaded processors , 1999, ICS '99.

[17]  Kunle Olukotun,et al.  Improving the performance of speculatively parallel applications on the Hydra CMP , 1999 .

[18]  Gurindar S. Sohi,et al.  Task selection for a multiscalar processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[19]  Kunle Olukotun,et al.  Exposing speculative thread parallelism in SPEC2000 , 2005, PPoPP.

[20]  G.S. Sohi,et al.  Dynamic Speculation And Synchronization Of Data Dependence , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[21]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[22]  Lawrence Rauchwerger,et al.  Polaris: Improving the Effectiveness of Parallelizing Compilers , 1994, LCPC.

[23]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[24]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[25]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[26]  Kunle Olukotun,et al.  TEST: a Tracer for Extracting Speculative Threads , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[27]  Antonia Zhai,et al.  Improving value communication for thread-level speculation , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[28]  Andreas Moshovos,et al.  Dynamic Speculation and Synchronization of Data Dependences , 1997, ISCA.

[29]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[30]  Josep Torrellas,et al.  Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[31]  Wei Liu,et al.  Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation , 2005, ICS '05.

[32]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[33]  Josep Torrellas,et al.  Architectural support for scalable speculative parallelization in shared-memory multiprocessors , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[34]  Rudolf Eigenmann,et al.  Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.

[35]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[36]  Monica S. Lam,et al.  In search of speculative thread-level parallelism , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[37]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[38]  Josep Torrellas,et al.  The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors , 2004, International Journal of Parallel Programming.

[39]  Dean M. Tullsen,et al.  Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.