Polyhedral parallelization of binary code

Many automatic software parallelization systems have been proposed in the past decades, but most of them are dedicated to source-to-source transformations. This paper shows that parallelizing executable programs is feasible, even if they require complex transformations, and in effect decouples parallelization from compilation, for example, for closed-source or legacy software, where binary code is the only available representation. We propose an automatic parallelizer, which is able to perform advanced parallelization on binary code. It first parses the binary code and extracts high-level information. From this information, a C program is generated. This program captures only a subset of the program semantics, namely, loops and memory accesses. This C program is then parallelized using existing, state-of-the-art parallelizers, including advanced polyhedral parallelizers. The original program semantics is then re-injected, and the transformed parallel loop nests are recompiled by a standard C compiler. We show on the PolyBench benchmark suite that our system successfully detects and parallelizes almost all the loop nests from the binary code, using a recent polyhedral loop parallelizer as a backend. The paper ends by elaborating a strategy to parallelize more complex programs, such as those containing non-linear accesses to memory, and provides a few example case-studies.

[1]  P. Feautrier Parametric integer programming , 1988 .

[2]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[3]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[4]  Andrew W. Appel,et al.  Modern Compiler Implementation in ML , 1997 .

[5]  Vincent Loechner,et al.  Parametric Analysis of Polyhedral Iteration Spaces , 1998, J. VLSI Signal Process..

[6]  Takashi Yokota,et al.  Preliminary evaluation of a binary translation system for multithreaded processors , 2002, International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems.

[7]  Philippe Clauss,et al.  A Symbolic Approach to Bernstein Expansion for Program Analysis and Optimization , 2004, CC.

[8]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[9]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[10]  K. De Bosschere,et al.  DIABLO: a reliable, retargetable and extensible link-time rewriting framework , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[11]  Wei Liu,et al.  POSH: a TLS compiler that exploits program structure , 2006, PPoPP '06.

[12]  Vincent Loechner,et al.  Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions , 2007, Algorithmica.

[13]  Michael Franz,et al.  Dynamic parallelization and mapping of binary executables on hierarchical platforms , 2006, CF '06.

[14]  Gregory R. Andrews,et al.  PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture , 2007 .

[15]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[16]  Rudolf Eigenmann,et al.  Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.

[17]  Philippe Clauss,et al.  Symbolic Polynomial Maximization Over Convex Sets and Its Application to Memory Requirement Estimation , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  Rajeev Barua,et al.  Automatic Parallelization in a Binary Rewriter , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[19]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[20]  Uday Bondhugula,et al.  Combined iterative and model-driven optimization in an automatic parallelization framework , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[21]  Michael Laurenzano,et al.  PEBIL: Efficient static binary instrumentation for Linux , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[22]  Seon Wook Kim,et al.  Runtime parallelization of legacy code on a transactional memory system , 2011, HiPEAC.

[23]  Kunle Olukotun,et al.  Runtime automatic speculative parallelization , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[24]  Kevin Skadron,et al.  Feasibility of Dynamic Binary Parallelization , 2011 .