论文信息 - Feasibility of Dynamic Binary Parallelization

Feasibility of Dynamic Binary Parallelization

This paper proposes DBP, an automatic technique that transparently parallelizes a sequential binary executable while it is running. A prototype implementation in simulation was able to increase sequential execution speeds by up to 1.96x, averaged over three benchmarks suites.

[1] Rudolf Eigenmann,et al. Min-cut program decomposition for thread-level speculation , 2004, PLDI '04.

[2] Jason Mars,et al. MATS : Multicore Adaptive Trace Selection , 2008 .

[3] Haitham Akkary,et al. A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[4] Gregory J. Chaitin,et al. Register allocation and spilling via graph coloring , 2004, SIGP.

[5] Diego R. Llanos Ferraris,et al. Just-In-Time Scheduling for Loop-based Speculative Parallelization , 2008, 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008).

[6] Teresa H. Meng,et al. Embracing heterogeneity: parallel programming for changing hardware , 2009 .

[7] Sanjay J. Patel,et al. rePLay: A Hardware Framework for Dynamic Optimization , 2001, IEEE Trans. Computers.

[8] Nathan Clark. Why Should I Rewrite My Software When Dynamic Compilation Can Be Good Enough ? , 2008 .

[9] Guilherme Ottoni,et al. Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[10] Vasanth Bala,et al. Dynamo: a transparent dynamic optimization system , 2000, SIGP.

[11] Rajiv Gupta,et al. Copy or Discard execution model for speculative parallelization on multicores , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[12] Wen-mei W. Hwu,et al. Automatic Discovery of Coarse-Grained Parallelism in Media Applications , 2007, Trans. High Perform. Embed. Archit. Compil..

[13] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[14] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .

[15] Wei Liu,et al. POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[16] Easwaran Raman,et al. Parallel-stage decoupled software pipelining , 2008, CGO '08.

[17] Daniel Gajski,et al. Hypertool: A Programming Aid for Message-Passing Systems , 1990, IEEE Trans. Parallel Distributed Syst..

[18] Nathan Clark,et al. Commutativity analysis for software parallelization: letting program transformations see the big picture , 2009, ASPLOS.

[19] Tarek S. Abdelrahman,et al. The use of hardware transactional memory for the trace-based parallelization of recursive Java programs , 2009, PPPJ '09.

[20] Tarek S. Abdelrahman,et al. Automatic Trace-Based Parallelization of Java Programs , 2007, 2007 International Conference on Parallel Processing (ICPP 2007).

[21] James E. Smith,et al. Path-based next trace prediction , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[22] Weifeng Zhang,et al. An event-driven multithreaded dynamic optimization framework , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[23] Manoj Franklin,et al. A general compiler framework for speculative multithreading , 2002, SPAA '02.

[24] Erik R. Altman,et al. Daisy: Dynamic Compilation For 10o?40 Architectural Compatibility , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[25] Easwaran Raman,et al. Speculative Decoupled Software Pipelining , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[26] Rajeev Barua,et al. Automatic Parallelization in a Binary Rewriter , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[27] William J. Dally,et al. Evaluating the Imagine stream architecture , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[28] Anant Agarwal,et al. Scalar operand networks: on-chip interconnect for ILP in partitioned architectures , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[29] Chen Ding,et al. Software behavior oriented parallelization , 2007, PLDI '07.

[30] Scott Mahlke,et al. Effective compiler support for predicated execution using the hyperblock , 1992, MICRO 1992.

[31] Kevin Skadron,et al. Federation: Repurposing scalar cores for out-of-order instruction issue , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[32] Michael Franz,et al. Tracing for web 3.0: trace compilation for the next generation web applications , 2009, VEE '09.

[33] Jack W. Davidson,et al. Secure and practical defense against code-injection attacks using software dynamic translation , 2006, VEE '06.

[34] Michael Franz,et al. Dynamic parallelization and mapping of binary executables on hierarchical platforms , 2006, CF '06.

[35] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.

[36] Wen-mei W. Hwu,et al. A hardware mechanism for dynamic extraction and relayout of program hot spots , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[37] Woody Lichtenstein,et al. The multiflow trace scheduling compiler , 1993, The Journal of Supercomputing.

[38] Engin Ipek,et al. Core fusion: accommodating software diversity in chip multiprocessors , 2007, ISCA '07.

[39] Tarek S. Abdelrahman,et al. The potential of trace-level parallelism in Java programs , 2007, PPPJ.

[40] Eric Rotenberg,et al. Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[41] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[42] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[43] Derek Bruening,et al. An infrastructure for adaptive dynamic optimization , 2003, International Symposium on Code Generation and Optimization, 2003. CGO 2003..

[44] Margaret Martonosi,et al. Multipath execution: opportunities and limits , 1998, ICS '98.

[45] Derek Bruening,et al. Secure Execution via Program Shepherding , 2002, USENIX Security Symposium.

[46] Scott A. Mahlke,et al. Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[47] Anshuman Dasgupta. Vizer: A framework to analyze and vectorize Intel x86 binaries , 2003 .

[48] Sanjay J. Patel,et al. Increasing the size of atomic instruction blocks using control flow assertions , 2000, Proceedings 33rd Annual IEEE/ACM International Symposium on Microarchitecture. MICRO-33 2000.

[49] Vivek Sarkar,et al. Space-time scheduling of instruction-level parallelism on a raw machine , 1998, ASPLOS VIII.

[50] Jing Yang,et al. Dimension: an instrumentation tool for virtual execution environments , 2006, VEE '06.

[51] Saman P. Amarasinghe,et al. Convergent scheduling , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..

[52] Scott A. Mahlke,et al. Region-based hierarchical operation partitioning for multicluster processors , 2003, PLDI '03.

[53] Antonio González,et al. Speculative multithreaded processors , 1998, ICS '98.

[54] Dirk Grunwald,et al. Instruction fetch mechanisms for multipath execution processors , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[55] David I. August,et al. Decoupled software pipelining with the synchronization array , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[56] Scott A. Mahlke,et al. Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[57] Nicholas Nethercote,et al. Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[58] Paolo Faraboschi,et al. DELI: a new run-time control point , 2002, MICRO.

[59] José González,et al. Dual path instruction processing , 2002, ICS '02.

[60] Guilherme Ottoni,et al. Global Multi-Threaded Instruction Scheduling , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[61] Antonio González,et al. Clustered speculative multithreaded processors , 1999, ICS '99.

[62] Bradford L. Chamberlain,et al. Parallel Programmability and the Chapel Language , 2007, Int. J. High Perform. Comput. Appl..

[63] Wei Liu,et al. Dynamic parallelization of single-threaded binary programs using speculative slicing , 2009, ICS.