A Machine Independent Algorithm for Code Generation and Its Use in Retargetable Compilers

This dissertation presents a method for the construction of efficient code generators for high-level procedural programming languages from a symbolic description of the instruction set of the target computer. A table driven algorithm is given that translates a relatively low-level intermediate representation of a program into assembly or machine code for the target com puter. A construction algorithm is presented that produces the required tables from a func tional description of the target machine. By supplying an appropriate machine description, new tables can easily be created, thus retargeting a compiler for the new computer. Techniques are developed to prove the correctness of the resulting code generator based on the instruction set description. The output of the front end of the compiler is assumed to be a linearized intermediate representation (IR) of the source program consisting of a sequence of parenthesis-free prefix expressions. Implementation decisions concerning representation and storage allocation, as well as all but the low-level, machine dependent optimizations are already incorporated into the IR. Each machine instruction is described by a prefix expression and an assembly or machine language template. The code generation algorithm performs a pattern-matching similar to pars ing. However, unlike the situation in syntax analysis, target machine descriptions are normally highly ambiguous. By defining a property called uniformity, which is satisfied by most instruc tion sets, it is possible to give a concise characterization of the sequence of prefix expressions computed by an instruction set, to check that all possible inputs to the code generator fall within this class, and to produce a left-to-right deterministic linear-time code generator. Ambiguities in the machine description are resolved in favor of choosing longer instruc tion patterns over shorter ones, thus effectively attempting to produce the object program that is shortest in terms of the number of instructions generated while containing the same sequence of operations. In practice this heuristic works very well. In comparison with existing compilers, the code generated by this algorithm is of equal or better quality (in terms of the size of the code produced). The instances in which existing compilers produce superior code tResearch sponsored by National Science Foundation Grant MCS74-07644-A03. stem from optimizations, i.e. changes in the sequence of operations, that were not employed in this work. Most of these optimizations could be combined with our method of code generation. The code generation routines for most existing compilers are written by hand and use sequences of instructions identified by the implementer. By choosing code sequences in a sys tematic algorithmic fashion, our code generators are more consistent and more successful in using the full range of machine instructions, including many special purpose instructions. Professor Susan L. Graham Chairman of Committee

[1]  Peter C. Poole,et al.  Portable and Adaptable Compilers , 1976, Compiler Construction.

[2]  Hans-Hellmut Nagel,et al.  Postlude to a PASCAL‐compiler bootstrap on a decsystem‐10 , 1976, Softw. Pract. Exp..

[3]  William A. Wulf,et al.  BLISS: a language for systems programming , 1971, CACM.

[4]  Niklaus Wirth,et al.  The design of a pascal compiler , 1971, Softw. Pract. Exp..

[5]  Victor B. Schneider,et al.  Quick compiler construction using uniform code generators , 1976, Softw. Pract. Exp..

[6]  Joseph Michael Newcomer Machine-independent generation of optimal local code. , 1975 .

[7]  Alfred V. Aho,et al.  Code Generation for Expressions with Common Subexpressions , 1977, J. ACM.

[8]  Thomas Richard Wilcox Generating machine code for high-level programming languages , 1971 .

[9]  Steven William Weingart,et al.  An efficient and systematic method of compiler code-generation. , 1973 .

[10]  P. L. Miller AUTOMATIC CREATION OF A CODE GENERATOR FROM A MACHINE DESCRIPTION , 1971 .

[11]  Niklaus Wirth On PASCAL, code generation, and the CDC 6000 computer. , 1972 .

[12]  Niklaus Wirth,et al.  PL360, a Programming Language for the 360 Computers , 1968, JACM.

[13]  Alfred V. Aho,et al.  Deterministic parsing of ambiguous grammars , 1975, Commun. ACM.

[14]  Owen R. Mock,et al.  The problem of programming communication with changing machines: a proposed solution , 1958, CACM.

[15]  Edward S. Lowry,et al.  Object code optimization , 1969, CACM.

[16]  Jim Welsh,et al.  A pascal compiler for ICL 1900 series computers , 1972, Softw. Pract. Exp..

[17]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[18]  Jr. T. B. Steel,et al.  A first version of UNCOL , 1899, IRE-AIEE-ACM '61 (Western).

[19]  W. J. Meyers Linear representation of tree structure - a mathematical theory of parenthesis-free notations , 1971, STOC '71.

[20]  William H. Harrison A New Strategy for Code Generation - the General-Purpose Optimizing Compiler , 1979, IEEE Trans. Software Eng..

[21]  Coenraad Bron,et al.  A pascal compiler for PDP 11 minicomputers , 1976, Softw. Pract. Exp..

[22]  Alfred V. Aho,et al.  Optimal Code Generation for Expression Trees , 1976, J. ACM.

[23]  David B. Loveman,et al.  Program Improvement by Source-to-Source Transformation , 1977, J. ACM.