Analytic Variations on the Common Subexpression Problem

Any tree can be represented in a maximally compact form as a directed acyclic graph where common subtrees are factored and shared, being represented only once. Such a compaction can be effected in linear time. It is used to save storage in implementations of functional programming languages, as well as in symbolic manipulation and computer algebra systems. In compiling, the compaction problem is known as the “common subexpression problem” and it plays a central role in register allocation, code generation and optimisation. We establish here that, under a variety of probabilistic models, a tree of size n has a compacted form of expected size asymptotically $$C\frac{n}{{\sqrt {\log n} }},$$ where the constant C is explicitly related to the type of trees to be compacted and to the statistical model reflecting tree usage. In particular the savings in storage approach 100% on average for large structures, which overperforms the commonly used form of sharing that is restricted to leaves (atoms).

[1]  Andrew Odlyzko Enumeration of Strings , 1985 .

[2]  Philippe Flajolet,et al.  The Average Height of Binary Trees and Other Simple Trees , 1982, J. Comput. Syst. Sci..

[3]  Bernard Lang,et al.  Programming Environments Based on Structured Editors: The MENTOR Experience, , 1980 .

[4]  J. Moon,et al.  On the Altitude of Nodes in Random Trees , 1978, Canadian Journal of Mathematics.

[5]  G. Pólya,et al.  Combinatorial Enumeration Of Groups, Graphs, And Chemical Compounds , 1988 .

[6]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[7]  Egon Börger,et al.  Trends in theoretical computer science , 1988 .

[8]  John McCarthy,et al.  LISP 1.5 Programmer's Manual , 1962 .

[9]  Douglas W. Clark,et al.  An empirical study of list structure in Lisp , 1977, CACM.

[10]  G. Pólya Kombinatorische Anzahlbestimmungen für Gruppen, Graphen und chemische Verbindungen , 1937 .

[11]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[12]  Jean-Marc Steyaert,et al.  Algebraic Simplification in Computer Algebra: An Analysis of Bottom-up Algorithms , 1990, Theor. Comput. Sci..

[13]  Douglas W. Clark Measurements of Dynamic List Structure Use in Lisp , 1979, IEEE Transactions on Software Engineering.

[14]  Jean-Marc Steyaert,et al.  Average-Case Analysis of Robinson's Unification Algorithm with Two Different Variables , 1989, Inf. Process. Lett..

[15]  Hans-Jürgen Bandelt Recognition of Tree Metrics , 1990, SIAM J. Discret. Math..

[16]  Michèle Soria,et al.  Complexity Analysis of Term-Rewriting Systems , 1989, Theor. Comput. Sci..

[17]  Jean-Marc Steyaert,et al.  Patterns and Pattern-Matching in Trees: An Analysis , 1984, Inf. Control..

[18]  Donald E. Knuth,et al.  Sorting and Searching , 1973 .

[19]  François Fages,et al.  Average Case Analysis of Unification Algorithms , 1991, STACS.

[20]  Philippe Flajolet,et al.  Singularity Analysis of Generating Functions , 1990, SIAM J. Discret. Math..

[21]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[22]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[23]  Robin Milner,et al.  On Observing Nondeterminism and Concurrency , 1980, ICALP.

[24]  John W. Moon,et al.  On an asymptotic method in enumeration , 1989, J. Comb. Theory, Ser. A.

[25]  Bruno Salvy,et al.  Lambda-Upsilon-Omega the 1989 cookbook , 1989 .

[26]  Robert E. Tarjan,et al.  Variations on the Common Subexpression Problem , 1980, J. ACM.