Exploiting Speculative Thread-Level Parallelism in Data Compression Applications

Although hardware support for Thread-Level Speculation (TLS) can ease the compiler's tasks in creating parallel programs by allowing the compiler to create potentially dependent parallel threads, advanced compiler optimization techniques must be developed and judiciously applied to achieve the desired performance. In this paper, we take a close examination on two data compression benchmarks, GZIP and BZIP2, propose, implement and evaluate new compiler optimization techniques to eliminate performance bottlenecks in their parallel execution and improve their performance. The proposed techniques (i) remove the critical forwarding path created by synchronizing memory-resident values; (ii) identify and categorize reduction-like variables whose intermediate results are used within loops, and propose code transformation to remove the inter-thread data dependences caused by these variables; and (iii) transform the program to eliminate stalls caused by variations in thread size. While no previous work has reported significant performance improvement on parallelizing these two benchmarks, we are able to achieve up to 36% performance improvement for GZIP and 37% for BZIP2.

[1]  Chen Yang,et al.  A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.

[2]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[3]  Manish Gupta,et al.  Techniques for Speculative Run-Time Parallelization of Loops , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[4]  K. Kennedy,et al.  Preliminary experiences with the Fortran D compiler , 1993, Supercomputing '93.

[5]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[6]  Monica S. Lam,et al.  In search of speculative thread-level parallelism , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[7]  Jenn-Yuan Tsai,et al.  Compiler Techniques for the Superthreaded Architectures1, 2 , 1999, International Journal of Parallel Programming.

[8]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[9]  Jenq Kuen Lee,et al.  Compiler support for speculative multithreading architecture with probabilistic points-to analysis , 2003, PPoPP '03.

[10]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[11]  Manoj Franklin,et al.  A fast approximate interprocedural analysis for speculative multithreading compilers , 2003, ICS '03.

[12]  Thomas F. Knight An architecture for mostly functional languages , 1986, LFP '86.

[13]  Josep Torrellas,et al.  Eliminating squashes through learning cross-thread violations in speculative parallelization for multiprocessors , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[14]  Josep Torrellas,et al.  The need for fast communication in hardware-based speculative chip multiprocessors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[15]  T. N. Vijaykumar,et al.  Register Communication Strategies for the Multiscalar Architecture , 1996 .

[16]  Gurindar S. Sohi,et al.  The expandable split window paradigm for exploiting fine-grain parallelsim , 1992, ISCA '92.

[17]  Kunle Olukotun,et al.  Using thread-level speculation to simplify manual parallelization , 2003, PPoPP '03.

[18]  Antonia Zhai,et al.  Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.

[19]  Jian Huang,et al.  The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.

[20]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[21]  John A. Gregory,et al.  Architectural Support for Thread-Level Data Speculation , 1997 .

[22]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[23]  Gurindar S. Sohi,et al.  The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[24]  Gurindar S. Sohi,et al.  Master/Slave Speculative Parallelization , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[25]  Vivek Sarkar,et al.  Languages and Compilers for Parallel Computing , 1994, Lecture Notes in Computer Science.

[26]  Xiao-Feng Li,et al.  Software Value Prediction for Speculative Parallel Threaded Computations , 2003 .

[27]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[28]  Antonia Zhai,et al.  Loop Selection for Thread-Level Speculation , 2005, LCPC.

[29]  Antonia Zhai,et al.  Compiler optimization of memory-resident value communication between speculative threads , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[30]  Kevin O'Brien,et al.  Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.

[31]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[32]  Dean M. Tullsen,et al.  Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.

[33]  Rudolf Eigenmann,et al.  Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.

[34]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[35]  Gurindar S. Sohi,et al.  Task selection for a multiscalar processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[36]  Kunle Olukotun,et al.  Exposing speculative thread parallelism in SPEC2000 , 2005, PPoPP.

[37]  Antonio González,et al.  Clustered speculative multithreaded processors , 1999, ICS '99.

[38]  Steven W. K. Tjiang,et al.  Integrating Scalar Optimization and Parallelization , 1991, LCPC.

[39]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .