LAMPVIEW: A LOOP-AWARE TOOLSET FOR FACILITATING PARALLELIZATION

A continual growth of the number of transistors per unit area coupled with diminishing returns from traditional microarchitectural and clock frequency improvements has led processor manufacturers to place multiple cores on a single chip. However, only multi-threaded code can fully take advantage of the new multicore processors; legacy single-threaded code does not benefit. Many approaches to parallelization have been explored, including both manual and automatic techniques. Unfortunately, research in this area is impeded by the innate difficulty of exploring code by hand for new possible parallelization schemes. Regardless of whether it is a researcher attempting to discover possible automatic techniques or a programmer trying to make manual parallelization, the benefits of good dependence information are substantial. This thesis provides a profiling and analysis toolset aimed at easing a programmer or researcher’s effort in finding parallelism. The toolset, The Loop-Aware Memory Profile Viewing System (LAMPView), is developed in three parts. The first part is a multi-frontend, multi-target compiler pass written to instrument the code with calls to the Loop-Aware Memory Profiling (LAMP) library. The compile-time instrumentation was partially developed previously and has been augmented here with additional features. The second part is a post-runtime processing pass that translates the output of the profiling run from a machine-level view to a source-level view. As it translates, it also processes and sorts dependence information. The third and final part is a pair of stand-alone utilities that takes the translated information and provides the user with human-readable output that is searchable by various parameters. In various discussions with potential users, it has been seen that the utility eases analysis of parallelism opportunities.

[1]  Rudolf Eigenmann,et al.  Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.

[2]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[3]  Kunle Olukotun,et al.  The Stanford Hydra CMP , 2000, IEEE Micro.

[4]  Claudio Demartini,et al.  A deadlock detection tool for concurrent Java programs , 1999, Softw. Pract. Exp..

[5]  Antonia Zhai,et al.  The STAMPede approach to thread-level speculation , 2005, TOCS.

[6]  Kunle Olukotun,et al.  TEST: a Tracer for Extracting Speculative Threads , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[7]  Larry Rudolph,et al.  DEP: Detailed execution profile , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Wei Liu,et al.  POSH: A Profiler-Enhanced TLS Compiler that Leverages Program Structure , 2005 .

[9]  David A. Padua,et al.  Compiler Algorithms for Synchronization , 1987, IEEE Transactions on Computers.

[10]  Scott A. Mahlke,et al.  Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[11]  Guilherme Ottoni,et al.  Communication optimizations for global multi-threaded instruction scheduling , 2008, ASPLOS.

[12]  Xiangyu Zhang,et al.  Alchemist: A Transparent Dependence Distance Profiling Infrastructure , 2009, 2009 International Symposium on Code Generation and Optimization.

[13]  Manoj Franklin,et al.  A general compiler framework for speculative multithreading , 2002, SPAA '02.

[14]  Matthew J. Bridges,et al.  The velocity compiler: extracting efficient multicore execution from legacy sequential codes , 2008 .

[15]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[16]  Pen-Chung Yew,et al.  Efficient Doacross execution on distributed shared-memory multiprocessors , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[17]  Jian Huang,et al.  The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.

[18]  James Coyle,et al.  Deadlock detection in MPI programs , 2002, Concurr. Comput. Pract. Exp..

[19]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[20]  Amitabh Srivastava,et al.  Analysis Tools , 2019, Public Transportation Systems.

[21]  Monica S. Lam,et al.  Maximizing Parallelism and Minimizing Synchronization with Affine Partitions , 1998, Parallel Comput..

[22]  Peiyi Tang,et al.  Compiler techniques for data synchronization in nested parallel loops , 1990, ICS '90.

[23]  Alan Mycroft,et al.  Software thread-level speculation: an optimistic library implementation , 2008, IWMSE '08.

[24]  Chris Lattner,et al.  LLVM: AN INFRASTRUCTURE FOR MULTI-STAGE OPTIMIZATION , 2000 .

[25]  Kunle Olukotun,et al.  TEST: A Tracer for Extracting Speculative Thread , 2003, CGO.

[26]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[27]  Radu Iosif,et al.  A deadlock detection tool for concurrent Java programs , 1999, Softw. Pract. Exp..

[28]  James C. Corbett,et al.  Evaluating Deadlock Detection Methods for Concurrent Software , 1996, IEEE Trans. Software Eng..

[29]  Arthur J. Bernstein,et al.  Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..

[30]  Easwaran Raman,et al.  Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[31]  John Giacomoni,et al.  Visualizing potential parallelism in sequential programs , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[32]  David Alejandro Padua Haiek Multiprocessors: discussion of some theoretical and practical problems , 1980 .

[33]  Donald Yeung,et al.  A study of source-level compiler algorithms for automatic construction of pre-execution code , 2004, TOCS.

[34]  Easwaran Raman,et al.  Parallelization techniques with improved dependence handling , 2009 .

[35]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[36]  Easwaran Raman,et al.  Spice: speculative parallel iteration chunk execution , 2008, CGO '08.

[37]  Diego R. Llanos Ferraris,et al.  Toward efficient and robust software speculative parallelization on multiprocessors , 2003, PPoPP '03.

[38]  Ken Kennedy,et al.  Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion , 2004, Int. J. High Perform. Comput. Appl..

[39]  David I. August,et al.  Intelligent speculation for pipelined multithreading , 2008 .

[40]  Antonio González,et al.  Clustered speculative multithreaded processors , 1999, ICS '99.