Code generation and optimization for embedded digital signal processors

The advent of deep submicron processing technology has made it possible and desirable to integrate a processor core, a program ROM, and application-specific circuitry all on a single IC. As the complexity of embedded software grows, highlevel languages such as C and C++ are increasingly employed in writing embedded software. Consequently, high-level language compilers have become an essential tool in the development of embedded systems. Fixed-point digital signal processors are among the most commonly embedded cores, due to their favorable performance–cost characteristics. However, these architectures are usually designed and optimized for their application domain, and pose challenges for compiler technology. Traditional compiler optimizations, though necessary, are insufficient for generating efficient and compact code. Therefore, new optimizations are required to produce code of the highest quality in a reasonable amount of time. In this thesis the author presents techniques for code generation and optimization that target embedded digital signal processors. These techniques have proven to be effective in improving the performance and reducing the size of compiled software. This thesis emphasizes optimization techniques; only by gaining a deeper understanding of the problems involved can we then apply them to a wider class of architectures. Keywords—compiler optimizations, digital signal processors, embedded systems. Thesis Supervisor: Srinivas Devadas Title: Associate Professor of Electrical Engineering and Computer Science

[1]  Christopher W. Fraser,et al.  Analyzing and compressing assembly code , 1984, SIGPLAN '84.

[2]  Gerhard Zimmermann The Mimola Design System a Computer Aided Digital Processor Design Method , 1979, 16th Design Automation Conference.

[3]  Christopher W. Fraser,et al.  Code selection through object code optimization , 1984, TOPL.

[4]  Kurt Keutzer,et al.  Logic Synthesis , 1994 .

[5]  Christopher W. Fraser,et al.  Engineering a simple, efficient code-generator generator , 1992, LOPL.

[6]  J. Allen,et al.  Computer architecture for digital signal processing , 1985, Proceedings of the IEEE.

[7]  Keith Paton,et al.  An algorithm for finding a fundamental set of cycles of a graph , 1969, CACM.

[8]  Ken Kennedy,et al.  Efficient call graph analysis , 1992, LOPL.

[9]  R. G. G. Cattell,et al.  Automatic Derivation of Code Generators from Machine Descriptions , 1980, TOPL.

[10]  Pierre G. Paulin,et al.  Insulin: An Instruction Set Simulation Environment , 1993, CHDL.

[11]  Robert K. Brayton,et al.  An exact minimizer for Boolean relations , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[12]  Edward A. Lee,et al.  A hardware-software codesign methodology for DSP applications , 1993, IEEE Design & Test of Computers.

[13]  E.A. Lee Programmable DSP architectures. II , 1989, IEEE ASSP Magazine.

[14]  Jeffrey D. Ullman,et al.  The Generation of Optimal Code for Arithmetic Expressions , 1970, JACM.

[15]  Peter Marwedel,et al.  Tree-based mapping of algorithms to predefined structures , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).

[16]  Pierre G. Paulin,et al.  CodeSyn : A Retargetable Code Synthesis System , 1997 .

[17]  Sharad Malik,et al.  Optimal code generation for embedded memory non-homogeneous register architectures , 1995 .

[18]  Christopher W. Fraser,et al.  A Retargetable C Compiler: Design and Implementation , 1995 .

[19]  D. H. Bartley,et al.  Optimizing stack frame accesses for processors with restricted addressing modes , 1992, Softw. Pract. Exp..

[20]  Michael D. Ernst,et al.  Value dependence graphs: representation without taxation , 1994, POPL '94.

[21]  Melvin E. Conway,et al.  Proposal for an UNCOL , 1958, CACM.

[22]  Charles N. Fischer,et al.  Affix grammar driven code generation , 1985, TOPL.

[23]  W. M. McKeeman,et al.  Peephole optimization , 1965, CACM.

[24]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[25]  John L. Hennessy,et al.  The priority-based coloring approach to register allocation , 1990, TOPL.

[26]  Gert Goossens,et al.  Chess: retargetable code generation for embedded DSP processors , 1994, Code Generation for Embedded Processors.

[27]  Erik Ruf Optimizing Sparse Representations for Dataflow Analysis , 1995, Intermediate Representations Workshop.

[28]  Alberto Sangiovanni-Vincentelli,et al.  Logic synthesis for vlsi design , 1989 .

[29]  Kurt Keutzer,et al.  Code Optimization Techniques in Embedded DSP Microprocessors , 1998, Des. Autom. Embed. Syst..

[30]  Gert Goossens,et al.  Code Generation for Embedded Processors , 1995 .

[31]  Christopher W. Fraser,et al.  Eliminating redundant object code , 1982, POPL '82.

[32]  Susan L. Graham,et al.  Table-Driven Code Generation , 1980, Computer.

[33]  Edward A. Lee Programmable dsp architectures: part ii , 1988 .

[34]  Alfred V. Aho,et al.  Code Generation for Expressions with Common Subexpressions , 1977, J. ACM.

[35]  Etienne Morel,et al.  Global optimization by suppression of partial redundancies , 1979, CACM.

[36]  James A. Storer,et al.  Data compression via textual substitution , 1982, JACM.

[37]  Charles N. Fischer,et al.  Retargetable Compiler Code Generation , 1982, CSUR.

[38]  Jan van Leeuwen,et al.  Graph Algorithms , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[39]  Bernhard Steffen,et al.  The power of assignment motion , 1995, PLDI '95.

[40]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[41]  Robert A. Wagner,et al.  Common phrases and minimum-space text storage , 1973, CACM.

[42]  Mary Hall Managing interprocedural optimization , 1992 .

[43]  David Callahan,et al.  Register allocation via hierarchical graph coloring , 1991, PLDI '91.

[44]  Julie Shipnes,et al.  A modular approach to Motorola PowerPC compilers , 1994, CACM.

[45]  Alfred V. Aho,et al.  Code generation using tree matching and dynamic programming , 1989, ACM Trans. Program. Lang. Syst..

[46]  Joe D. Warren,et al.  The program dependence graph and its use in optimization , 1984, TOPL.

[47]  James F. Gimpel The Minimization of TANT Networks , 1967, IEEE Trans. Electron. Comput..

[48]  Kurt Keutzer,et al.  Instruction selection using binate covering for code size optimization , 1995, Proceedings of IEEE International Conference on Computer Aided Design (ICCAD).

[49]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[50]  Pierre G. Paulin,et al.  Register assignrnent through resource classification for ASIP microcode generation , 1994, IEEE/ACM International Conference on Computer-Aided Design.

[51]  Rainer Leupers,et al.  Methods for retargetable DSP code generation , 1994, Proceedings of 1994 IEEE Workshop on VLSI Signal Processing.

[52]  Steven W. K. Tjiang,et al.  Sharlit—a tool for building optimizers , 1992, PLDI '92.

[53]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[54]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[55]  Giovanni De Micheli,et al.  Hardware-software cosynthesis for digital systems , 1993, IEEE Design & Test of Computers.

[56]  Kurt Keutzer,et al.  Storage assignment to decrease code size , 1996, TOPL.

[57]  Olivier Coudert,et al.  New Ideas for Solving Covering Problems , 1995, 32nd Design Automation Conference.

[58]  Susan J. Eggers,et al.  The Marion system for retargetable instruction scheduling , 1991, PLDI '91.

[59]  John Cocke,et al.  Register Allocation Via Coloring , 1981, Comput. Lang..

[60]  Rainer Leupers,et al.  Retargetable assembly code generation by bootstrapping , 1994, Proceedings of 7th International Symposium on High-Level Synthesis.

[61]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[62]  Dhananjay M. Dhamdhere,et al.  How to analyze large programs efficiently and informatively , 1992, PLDI '92.

[63]  Pierre G. Paulin,et al.  DSP design tool requirements for embedded systems: A telecommunications industrial perspective , 1995, J. VLSI Signal Process..

[64]  Hugo De Man,et al.  Integration of medium-throughput signal processing algorithms on flexible instruction-set architectures , 1995, J. VLSI Signal Process..

[65]  David Alex Lamb Construction of a peephole optimizer , 1981, Softw. Pract. Exp..

[66]  G. Goossens,et al.  PROGRAMMABLE CHIPS IN CONSUMER ELECTRONICS AND TELECOMMUNICATIONS , 1996 .

[67]  Jack G. Ganssle,et al.  The Art of Programming Embedded Systems , 1991 .

[68]  Hugo De Man,et al.  Integration of signal processing systems on heterogeneous IC architectures , 1992 .

[69]  M.N. Sastry,et al.  Structure and interpretation of computer programs , 1986, Proceedings of the IEEE.

[70]  Hugo De Man,et al.  Instruction set definition and instruction selection for ASIPs , 1994, Proceedings of 7th International Symposium on High-Level Synthesis.

[71]  M. Kozuch,et al.  Compression of embedded system programs , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[72]  Robert Giegerich A Formal Framework for the Derivation of Machine-Specific Optimizers , 1983, TOPL.

[73]  Adi Shamir,et al.  A method for obtaining digital signatures and public-key cryptosystems , 1978, CACM.

[74]  Pierre G. Paulin,et al.  CodeSyn: a retargetable code synthesis system (abstract) , 1994, ISSS '94.

[75]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[76]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[77]  Sharad Malik,et al.  Power analysis of embedded software: a first step towards software power minimization , 1994, IEEE Trans. Very Large Scale Integr. Syst..

[78]  Rainer Leupers,et al.  Instruction set extraction from programmable structures , 1994, EURO-DAC '94.

[79]  Susan L. Graham,et al.  A new method for compiler code generation , 1978, POPL '78.

[80]  W. Quine On Cores and Prime Implicants of Truth Functions , 1959 .

[81]  E. B. James,et al.  Information Compression by Factorising Common Strings , 1975, Computer/law journal.

[82]  Peter Marwedel,et al.  The MIMOLA Design System: Tools for the Design of Digital Processors , 1984, 21st Design Automation Conference Proceedings.

[83]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[84]  Monica S. Lam,et al.  The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler , 1994 .

[85]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[86]  Fabrizio Luccio,et al.  A Method for Minimizing the Number of Internal States in Incompletely Specified Sequential Networks , 1965, IEEE Trans. Electron. Comput..

[87]  A. Wolfe,et al.  Executing Compressed Programs On An Embedded RISC Architecture , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[88]  John Cocke,et al.  A methodology for the real world , 1981 .

[89]  Herman Schmit,et al.  A Model and Methodology for Hardware-Software Codesign , 1993, IEEE Des. Test Comput..

[90]  Francis Depuydt,et al.  Register Optimization and Scheduling for Real-Time Digital Signal Processing Architectures , 1993 .