Procedural Abstraction with Reverse Prefix Trees

For memory constrained environments like embedded systems, optimization for size is often as important as, if not more important than, optimization for execution speed. A common technique for compacting code is procedural abstraction. Equivalent code fragments are identified and abstracted into a procedure. The standard algorithm for identifying these fragments is based on suffix trees. We propose in this paper the calculation of suffix trees over the program text not in the common top-down fashion, but reversed, i.e. bottom-up. With this simple modification, not only equivalent fragments can be identified, but also fragments equivalent to (possibly often differently long) suffixes of the longest fragments. A longest fragment is then abstracted, and all fragments are replaced by procedure calls to their corresponding start instruction somewhere in the abstracted procedure. This allows us to harvest more and longer fragments than with standard suffix trees, improving code size reductions on average by 8.277% over standard suffix trees.

[1]  Koen De Bosschere,et al.  On the side-effects of code abstraction , 2003, LCTES '03.

[2]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[3]  Keith D. Cooper,et al.  Enhanced code compression for embedded RISC processors , 1999, PLDI '99.

[4]  Kurt Keutzer,et al.  Code density optimization for embedded DSP processors using data compression techniques , 1998, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[5]  Christopher W. Fraser,et al.  Analyzing and compressing assembly code , 1984, SIGPLAN '84.

[6]  Hyuk-Jae Lee,et al.  Iterative procedural abstraction for code size reduction , 2002, CASES '02.

[7]  Bjorn De Sutter,et al.  On the side-effects of code abstraction , 2003 .

[8]  Bjorn De Sutter,et al.  Compiler techniques for code compaction , 2000, TOPL.

[9]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[10]  Esko Ukkonen,et al.  On-line construction of suffix trees , 1995, Algorithmica.

[11]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[12]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[13]  Thorsten Meinl,et al.  Graph-Based Procedural Abstraction , 2008, International Symposium on Code Generation and Optimization (CGO'07).

[14]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[15]  Gregory R. Andrews,et al.  Code Compaction of an Operating System Kernel , 2007, International Symposium on Code Generation and Optimization (CGO'07).