A Generalized Framework for Automatic Scripting Language Parallelization

Computational scientists are typically not expert programmers, and thus work in easy to use dynamic languages. However, they have very high performance requirements, due to their large datasets and experimental setups. Thus, the performance required for computational science must be extracted from dynamic languages in a manner that is transparent to the programmer. Current approaches to optimize and parallelize dynamic languages, such as just-in-time compilation and highly optimized interpreters, require a huge amount of implementation effort and are typically only effective for a single language. However, scientists in different fields use different languages, depending upon their needs.This paper presents techniques to enable automatic extraction of parallelism within scripts that are universally applicable across multiple different dynamic scripting languages. The key insight is that combining a script with its interpreter, through program specialization techniques, will embed any parallelism within the script into the combined program that can then be extracted via automatic parallelization techniques. Additionally, this paper presents several enhancements to existing speculative automatic parallelization techniques to handle the dependence patterns created by the specialization process. A prototype of the proposed technique, called Partial Evaluation with Parallelization (PEP), is evaluated against two open-source script interpreters with 6 input linear algebra kernel scripts each. The resulting geomean speedup of 5.10× on a 24-core machine shows the potential of the generalized approach in automatic extraction of parallelism in dynamic scripting languages.

[1]  Zhiyuan Li,et al.  General data structure expansion for multi-threading , 2013, PLDI.

[2]  Matthew J. Bridges,et al.  The velocity compiler: extracting efficient multicore execution from legacy sequential codes , 2008 .

[3]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[4]  Arun Raman,et al.  Speculative parallelization using software multi-threaded transactions , 2010, ASPLOS XV.

[5]  Henning Makholm Specializing c - an introduction to the principles behind c-mix/f1 , 1999 .

[6]  Weixing Ji,et al.  Dynamic enforcement of determinism in a parallel scripting language , 2014, PLDI.

[7]  Markus Mock,et al.  Calpa: atool for automating dynamic compilation , 1999 .

[8]  Dean M. Tullsen,et al.  Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices , 2005, PLDI '05.

[9]  Yun Zhang,et al.  Commutative set: a language extension for implicit parallel programming , 2011, PLDI '11.

[10]  Diego R. Llanos Ferraris,et al.  Design space exploration of a software speculative parallelization scheme , 2005, IEEE Transactions on Parallel and Distributed Systems.

[11]  Gustavo Alonso,et al.  Pydron: Semi-Automatic Parallelization for Multi-Core and the Cloud , 2014, OSDI.

[12]  Guilherme Ottoni,et al.  Communication optimizations for global multi-threaded instruction scheduling , 2008, ASPLOS.

[13]  Vasanth Bala,et al.  Dynamo: a transparent dynamic optimization system , 2000, SIGP.

[14]  Rajiv Gupta,et al.  Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[15]  Pat Hanrahan,et al.  Riposte: A trace-driven compiler and parallel VM for vector code in R , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[16]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2007, PLDI '07.

[17]  Nagiza F. Samatova,et al.  Automatic Parallelization of Scripting Languages: Toward Transparent Desktop Parallel Computing , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[18]  David I. August,et al.  Intelligent speculation for pipelined multithreading , 2008 .

[19]  Jacques Noyé,et al.  A Uniform Approach for Compile-Time and Run-Time Specialization , 1996, Dagstuhl Seminar on Partial Evaluation.

[20]  Rajiv Gupta,et al.  Supporting speculative parallelization in the presence of dynamic data structures , 2010, PLDI '10.

[21]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1987, TOPL.

[22]  Scott A. Mahlke,et al.  Dynamic parallelization of JavaScript applications using an ultra-lightweight speculation mechanism , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.

[23]  Rastislav Bodík,et al.  Runtime specialization with optimistic heap analysis , 2005, OOPSLA '05.

[24]  Antonia Zhai,et al.  The STAMPede approach to thread-level speculation , 2005, TOCS.

[25]  Hsien-Hsin S. Lee,et al.  Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[26]  Gu-Yeon Wei,et al.  HELIX: automatic parallelization of irregular programs for chip multiprocessing , 2012, CGO '12.

[27]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[28]  Koen De Bosschere,et al.  The paralax infrastructure: Automatic parallelization with a helping hand , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[29]  Nir Shavit,et al.  Software transactional memory , 1995, PODC '95.

[30]  Nir Shavit,et al.  Understanding Tradeoffs in Software Transactional Memory , 2007, International Symposium on Code Generation and Optimization (CGO'07).

[31]  Scott A. Mahlke,et al.  Automatic speculative DOALL for clusters , 2012, CGO '12.

[32]  Easwaran Raman,et al.  Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[33]  Chen Ding,et al.  Software behavior oriented parallelization , 2007, PLDI '07.

[34]  Lawrence Rauchwerger,et al.  The R-LRPD test: speculative parallelization of partially parallel loops , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[35]  Quinn Jacobson,et al.  Architectural Support for Software Transactional Memory , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[36]  William Thies,et al.  A Practical Approach to Exploiting Coarse-Grained Pipeline Parallelism in C Programs , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[37]  Feng Liu,et al.  A survey of the practice of computational science , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[38]  Easwaran Raman,et al.  Parallel-stage decoupled software pipelining , 2008, CGO '08.

[39]  Clément Farabet,et al.  Torch7: A Matlab-like Environment for Machine Learning , 2011, NIPS 2011.

[40]  Scott A. Mahlke,et al.  Uncovering hidden loop level parallelism in sequential applications , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[41]  Dennis Shasha,et al.  Parakeet: a just-in-time parallel accelerator for python , 2012, HotPar'12.

[42]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[43]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[44]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[45]  Lars Ole Andersen,et al.  Program Analysis and Specialization for the C Programming Language , 2005 .

[46]  Mason Chang,et al.  Trace-based just-in-time type specialization for dynamic languages , 2009, PLDI '09.

[47]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[48]  Markus Mock,et al.  Annotation-Directed Run-Time Specialization in C , 1997, PEPM.

[49]  David I. August,et al.  Practical automatic loop specialization , 2013, ASPLOS '13.

[50]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[51]  Guilherme Ottoni,et al.  The hiphop virtual machine , 2014, OOPSLA.

[52]  Ayal Zaks,et al.  Speculative separation for privatization and reductions , 2012, PLDI.

[53]  Wei Liu,et al.  Dynamic parallelization of single-threaded binary programs using speculative slicing , 2009, ICS.

[54]  Scott A. Mahlke,et al.  Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory , 2009, PLDI '09.