Using Multiset Discrimination to Solve Language Processing Problems Without Hashing

It is generally assumed that hashing is essential to solve many language processing problems efficiently; e.g. symbol table formation and maintenance, grammar manipulation, basic block optimization, and global optimization. This paper questions this assumption, and initiates development of an efficient alternative compiler methodology without hashing or sorting. The methodology rests on efficient solutions to the basic problem of detecting duplicate values in a multiset, which we call multiset discrimination. Paige and Tarjan (1987) gave an efficient solution to multiset discrimination for detecting duplicate elements occurring in a multiset of varying length strings. The technique was used to develop an improved algorithm for lexicographic sorting, whose importance stems largely from its use in solving a variety of isomorphism problems (Aho et al., 1974). The current paper and a related paper (Paige, 1994) show that full lexicographic sorting is not needed to solve these isomorphism problems, because they can be solved more efficiently using straightforward extensions to the simpler multiset discrimination technique. By reformulating language processing problems in terms of multiset discrimination, we also show how almost every subtask of compilation can be solved without hashing in worst case running time no worse (and frequently better) than the best previous expected time solution (under the assumption that one hash operation takes unit expected time). Because of their simplicity, our solutions may be of practical as well as theoretical interest. The various applications presented culminate with a new algorithm to solve iterated strength reduction folded with useless code elimination that runs in worst case asymptotic time and auxiliary space Θ(¦L¦ + ¦L'¦), where ¦L¦ and ¦L'¦ represent the lengths of the initial and optimized programs, respectively. The previous best solution due to Cocke and Kennedy (1977) takes Ω(¦L¦3¦L'¦) has operations in the worst case.

[1]  Wuu Yang,et al.  Detecting Program Components With Equivalent Behaviors , 1989 .

[2]  Jonathan Rees,et al.  Macros that work , 1991, POPL '91.

[3]  Mike Paterson,et al.  Linear unification , 1976, STOC '76.

[4]  Edmond Schonberg,et al.  Taliere: an interactive system for data structuring setl programs , 1988 .

[5]  J. Cocke Global common subexpression elimination , 1970, Symposium on Compiler Optimization.

[6]  Wuu Yang,et al.  A new algorithm for semantics-based program integration , 1990 .

[7]  Mark N. Wegman,et al.  Constant propagation with conditional branches , 1985, POPL.

[8]  Daniel J. Rosenkrantz,et al.  Compiler design theory , 1976 .

[9]  John Cocke,et al.  Programming languages and their compilers , 1969 .

[10]  Robert Paige,et al.  Real-time Simulation of a Set Machine on a Ram , 1989 .

[11]  Bowen Alpern,et al.  Detecting equality of variables in programs , 1988, POPL '88.

[12]  Robert E. Tarjan,et al.  A Linear Time Solution to the Single Function Coarsest Partition Problem , 1985, Theor. Comput. Sci..

[13]  Robert Paige,et al.  Symbolic Finite Differencing - Part I , 1990, ESOP.

[14]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[15]  Stephen Warshall,et al.  A Theorem on Boolean Matrices , 1962, JACM.

[16]  Donald E. Knuth,et al.  The art of computer programming: V.1.: Fundamental algorithms , 1997 .

[17]  Robert E. Tarjan,et al.  A Class of Algorithms which Require Nonlinear Time to Maintain Disjoint Sets , 1979, J. Comput. Syst. Sci..

[18]  Robert E. Tarjan,et al.  Variations on the Common Subexpression Problem , 1980, J. ACM.

[19]  John E. Hopcroft,et al.  An n log n algorithm for minimizing states in a finite automaton , 1971 .

[20]  Eduardo Pelegri-Llopart,et al.  Rewrite systems, pattern matching, and code generation , 1988 .

[21]  Dominique Revuz,et al.  Minimisation of Acyclic Deterministic Automata in Linear Time , 1992, Theor. Comput. Sci..

[22]  Christoph M. Hoffmann,et al.  Pattern Matching in Trees , 1982, JACM.

[23]  Robert Paige,et al.  Efficient Translation of External Input in a Dynamically Typed Language , 1994, IFIP Congress.

[24]  Dan E. Willard,et al.  Quasilinear Algorithms for Processing Relational Calculus Expressions. , 1990, PODS 1990.

[25]  Edmond Schonberg,et al.  Programming with Sets: An Introduction to SETL , 1986 .

[26]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[27]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[28]  Robert Paige,et al.  Look ma, no hashing, and no arrays neither , 1991, POPL '91.

[29]  Harry G. Mairson The program complexity of searching a table , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[30]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[31]  Ron Cytron,et al.  Code motion of control structures in high-level languages , 1986, POPL '86.

[32]  Jay Earley,et al.  High Level Iterators and a Method for Automatically Designing Data Structure Representation , 1976, Comput. Lang..

[33]  Ken Kennedy,et al.  An algorithm for reduction of operator strength , 1977, Commun. ACM.

[34]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..