Building a Control-flow Graph from Scheduled Assembly Code

A variety of applications have arisen where it is worthwhile to apply code optimizations directly to the machine code (or assembly code) produced by a compiler. These include link-time whole-program analysis and optimization, code compression, binaryto-binary translation, and bit-transition reduction (for power). Many, if not most, optimizations assume the presence of a control-flow graph (cfg). Compiled, scheduled code has properties that can make cfg construction more complex than it is inside a typical compiler. In this paper, we examine the problems of scheduled code on architectures that have multiple delay slots. In particular, if branch delay slots contain other branches, the classic algorithms for building a cfg produce incorrect results. We explain the problem using two simple examples. We then present an algorithm for building correct cfgs from scheduled assembly code that includes branches in branch-delay slots. The algorithm works by building an approximate cfg and then refining it to reflect the actions of delayed branches. If all branches have explicit targets, the complexity of the refining step is linear with respect to the number of branches in the code. Analysis of the kind presented in this paper is a necessary first step for any system that analyzes or translates compiled, assembly-level code. We have implemented this algorithm in our power-consumption experiments based on the TMS320C6200 architecture from Texas Instruments. The development of our algorithm was motivated by the output of TI’s compiler. Authors’ address: Department of Computer Science; Rice University, MS 132; Houston, TX, USA 77005. Corresponding author: waterman@rice.edu

[1]  Keith D. Cooper,et al.  Value Numbering , 1997, Softw. Pract. Exp..

[2]  Francine Berman,et al.  The GrADS Project: Software Support for High-Level Grid Application Development , 2001, Int. J. High Perform. Comput. Appl..

[3]  C. Robert Morgan,et al.  Building an Optimizing Compiler , 1998 .

[4]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[5]  N. Seshan High VelociTI processing [Texas Instruments VLIW DSP architecture] , 1998 .

[6]  Todd Ryan Waterman Post-compilation analysis and power reduction , 2002 .

[7]  Allen I. Holub Compiler design in C , 1990 .

[8]  Andrew W. Appel,et al.  Modern Compiler Implementation in Java , 1997 .

[9]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[10]  Ken Kennedy,et al.  Constructing the Procedure Call Multigraph , 1990, IEEE Trans. Software Eng..

[11]  Koen De Bosschere,et al.  alto: a link-time optimizer for the Compaq Alpha , 2001, Softw. Pract. Exp..

[12]  Mark N. Wegman,et al.  Efficiently computing static single assignment form and the control dependence graph , 1991, TOPL.

[13]  David W. Goodwin,et al.  Interprocedural dataflow analysis in an executable optimizer , 1997, PLDI '97.

[14]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[15]  Ken Kennedy,et al.  Practical dependence testing , 1991, PLDI '91.

[16]  Philip H. Sweany,et al.  Dominator-path scheduling: a global scheduling method , 1992, MICRO.

[17]  Jack W. Davidson,et al.  Profile guided code positioning , 1990, SIGP.

[18]  Charles N. Fischer,et al.  Crafting a Compiler with C , 1991 .

[19]  David W. Wall,et al.  A practical system fljr intermodule code optimization at link-time , 1993 .

[20]  Norman Ramsey,et al.  A transformational approach to binary translation of delayed branches , 2003, TOPL.

[21]  Monica S. Lam,et al.  Efficient and exact data dependence analysis , 1991, PLDI '91.