Speculative Program Parallelization with Scalable and Decentralized Runtime Verification

Thread Level Speculation (TLS) is a dynamic code parallelization technique proposed to keep the software in pace with the advances in hardware, in particular, to automatically parallelize programs to take advantage of the multi-core processors. Being speculative, frameworks of this type unavoidably rely on verification systems that are similar to software transactional memory, and that require voluminous inter-thread communications or centralized registering of the performed memory accesses. The high degree of communication is against the basic principles of high performance parallel computing, does not scale with an increasing number of processor cores, and yields weak performance. Moreover, TLS systems often apply one unique parallelization strategy consisting in slicing a loop into several parallel speculative threads. Such a strategy is also against the basic principles since loops in the original serial code are not necessarily parallel and also, it is well-known that the parallel schedule must promote data locality which is crucial in obtaining good performance. This situation appeals to scalable and decentralized verification systems and new strategies to dynamically generate efficient parallel code resulting from advanced optimizing parallelizing transformations. Such transformations require a more complex verification system that allows intra-thread iterations to be reordered. In this paper, we propose a verification system of this kind, based on a model built at runtime and predicting a linear memory behavior. This strategy is part of the Apollo speculative code parallelizer which is based on an adaptation for dynamic usage of the polyhedral model.

[1]  Guy E. Blelloch,et al.  Brief announcement: the problem based benchmark suite , 2012, SPAA '12.

[2]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[3]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[4]  Christian Scheideler,et al.  Stabilization, Safety, and Security of Distributed Systems , 2012, Lecture Notes in Computer Science.

[5]  Saman Amarasinghe,et al.  Softspec: Software-based Speculative Parallelism , 2000 .

[6]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[7]  Rudolf Eigenmann,et al.  Speculative thread decomposition through empirical optimization , 2007, PPoPP.

[8]  Arun Raman,et al.  Speculative parallelization using software multi-threaded transactions , 2010, ASPLOS XV.

[9]  Paul Feautrier,et al.  Polyhedron Model , 2011, Encyclopedia of Parallel Computing.

[10]  Scott A. Mahlke,et al.  Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory , 2009, PLDI '09.

[11]  Brian T. Lewis,et al.  Compiler and runtime support for efficient software transactional memory , 2006, PLDI '06.

[12]  Scott A. Mahlke,et al.  Automatic speculative DOALL for clusters , 2012, CGO '12.

[13]  David Padua,et al.  Encyclopedia of Parallel Computing , 2011 .

[14]  Christof Fetzer,et al.  Speculation for Parallelizing Runtime Checks , 2009, SSS.

[15]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[16]  Alan Mycroft,et al.  A lightweight in-place implementation for software thread-level speculation , 2009, SPAA '09.

[17]  Mikel Luján,et al.  Optimizing software runtime systems for speculative parallelization , 2013, TACO.

[18]  Vincent Loechner,et al.  Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons , 2013, International Journal of Parallel Programming.

[19]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[20]  Easwaran Raman,et al.  Spice: speculative parallel iteration chunk execution , 2008, CGO '08.

[21]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.