Feasibility of Dynamic Binary Parallelization

This paper proposes DBP, an automatic technique that transparently parallelizes a sequential binary executable while it is running. A prototype implementation in simulation was able to increase sequential execution speeds by up to 1.96x, averaged over three benchmarks suites.

[1]  Rudolf Eigenmann,et al.  Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.

[2]  Jason Mars,et al.  MATS : Multicore Adaptive Trace Selection , 2008 .

[3]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[4]  Gregory J. Chaitin,et al.  Register allocation and spilling via graph coloring , 2004, SIGP.

[5]  Diego R. Llanos Ferraris,et al.  Just-In-Time Scheduling for Loop-based Speculative Parallelization , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[6]  Teresa H. Meng,et al.  Embracing heterogeneity: parallel programming for changing hardware , 2009 .

[7]  Sanjay J. Patel,et al.  rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.

[8]  Nathan Clark Why Should I Rewrite My Software When Dynamic Compilation Can Be Good Enough ? , 2008 .

[9]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[10]  Vasanth Bala,et al.  Dynamo: a transparent dynamic optimization system , 2000, SIGP.

[11]  Rajiv Gupta,et al.  Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[12]  Wen-mei W. Hwu,et al.  Automatic Discovery of Coarse-Grained Parallelism in Media Applications , 2007, Trans. High Perform. Embed. Archit. Compil..

[13]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[14]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[15]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[16]  Easwaran Raman,et al.  Parallel-stage decoupled software pipelining , 2008, CGO '08.

[17]  Daniel Gajski,et al.  Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[18]  Nathan Clark,et al.  Commutativity analysis for software parallelization: letting program transformations see the big picture , 2009, ASPLOS.

[19]  Tarek S. Abdelrahman,et al.  The use of hardware transactional memory for the trace-based parallelization of recursive Java programs , 2009, PPPJ '09.

[20]  Tarek S. Abdelrahman,et al.  Automatic Trace-Based Parallelization of Java Programs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[21]  James E. Smith,et al.  Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[22]  Weifeng Zhang,et al.  An event-driven multithreaded dynamic optimization framework , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[23]  Manoj Franklin,et al.  A general compiler framework for speculative multithreading , 2002, SPAA '02.

[24]  Erik R. Altman,et al.  Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[25]  Easwaran Raman,et al.  Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[26]  Rajeev Barua,et al.  Automatic Parallelization in a Binary Rewriter , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[27]  William J. Dally,et al.  Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[28]  Anant Agarwal,et al.  Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[29]  Chen Ding,et al.  Software behavior oriented parallelization , 2007, PLDI '07.

[30]  Scott Mahlke,et al.  Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[31]  Kevin Skadron,et al.  Federation: Repurposing scalar cores for out-of-order instruction issue , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[32]  Michael Franz,et al.  Tracing for web 3.0: trace compilation for the next generation web applications , 2009, VEE '09.

[33]  Jack W. Davidson,et al.  Secure and practical defense against code-injection attacks using software dynamic translation , 2006, VEE '06.

[34]  Michael Franz,et al.  Dynamic parallelization and mapping of binary executables on hierarchical platforms , 2006, CF '06.

[35]  Jaehyuk Huh,et al.  Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[36]  Wen-mei W. Hwu,et al.  A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[37]  Woody Lichtenstein,et al.  The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[38]  Engin Ipek,et al.  Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[39]  Tarek S. Abdelrahman,et al.  The potential of trace-level parallelism in Java programs , 2007, PPPJ.

[40]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[41]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[42]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[43]  Derek Bruening,et al.  An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[44]  Margaret Martonosi,et al.  Multipath execution: opportunities and limits , 1998, ICS '98.

[45]  Derek Bruening,et al.  Secure Execution via Program Shepherding , 2002, USENIX Security Symposium.

[46]  Scott A. Mahlke,et al.  Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[47]  Anshuman Dasgupta Vizer: A framework to analyze and vectorize Intel x86 binaries , 2003 .

[48]  Sanjay J. Patel,et al.  Increasing the size of atomic instruction blocks using control flow assertions , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[49]  Vivek Sarkar,et al.  Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[50]  Jing Yang,et al.  Dimension: an instrumentation tool for virtual execution environments , 2006, VEE '06.

[51]  Saman P. Amarasinghe,et al.  Convergent scheduling , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[52]  Scott A. Mahlke,et al.  Region-based hierarchical operation partitioning for multicluster processors , 2003, PLDI '03.

[53]  Antonio González,et al.  Speculative multithreaded processors , 1998, ICS '98.

[54]  Dirk Grunwald,et al.  Instruction fetch mechanisms for multipath execution processors , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[55]  David I. August,et al.  Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[56]  Scott A. Mahlke,et al.  Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[57]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[58]  Paolo Faraboschi,et al.  DELI: a new run-time control point , 2002, MICRO.

[59]  José González,et al.  Dual path instruction processing , 2002, ICS '02.

[60]  Guilherme Ottoni,et al.  Global Multi-Threaded Instruction Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[61]  Antonio González,et al.  Clustered speculative multithreaded processors , 1999, ICS '99.

[62]  Bradford L. Chamberlain,et al.  Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[63]  Wei Liu,et al.  Dynamic parallelization of single-threaded binary programs using speculative slicing , 2009, ICS.