Generic discrimination: sorting and paritioning unshared data in linear time

We introduce the notion of discrimination as a generalization of both sorting and partitioning and show that worst-case linear-time discrimination functions (discriminators) can be defined generically, by (co-)induction on an expressive language of order denotations. The generic definition yields discriminators that generalize both distributive sorting and multiset discrimination. The generic discriminator can be coded compactly using list comprehensions, with order denotations specified using Generalized Algebraic Data Types (GADTs). A GADT-free combinator formulation of discriminators is also given. We give some examples of the uses of discriminators, including a new most-significant-digit lexicographic sorting algorithm. Discriminators generalize binary comparison functions: They operate on n arguments at a time, but do not expose more information than the underlying equivalence, respectively ordering relation on the arguments. We argue that primitive types with equality (such as references in ML) and ordered types (such as the machine integer type), should expose their equality, respectively standard ordering relation, as discriminators: Having only a binary equality test on a type requires Θ(n2) time to find all the occurrences of an element in a list of length n, for each element in the list, even if the equality test takes only constant time. A discriminator accomplishes this in linear time. Likewise, having only a (constant-time) comparison function requires Θ(n log n) time to sort a list of n elements. A discriminator can do this in linear time.

[1]  Justin Zobel,et al.  Efficient Trie-Based Sorting of Large Sets of Strings , 2003, ACSC.

[2]  Yijie Han,et al.  Integer sorting in O(n/spl radic/(log log n)) expected time and linear space , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[3]  C. A. R. Hoare Algorithm 63: partition , 1961, CACM.

[4]  Fritz Henglein,et al.  Formally optimal boxing , 1994, POPL '94.

[5]  R. Paige,et al.  Multiset Discrimination - A Method for Implementing Programming Language Systems without Hashing , 1992 .

[6]  E. Szemerédi,et al.  Sorting inc logn parallel steps , 1983 .

[7]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 1: Sorting and Searching , 2011, EATCS Monographs on Theoretical Computer Science.

[8]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[9]  Jon Louis Bentley Programming pearls: Aha algorithms , 1983, CACM.

[10]  Fouad El-Aker,et al.  Efficient Adaptive In-Place Radix Sorting , 2004, Informatica.

[11]  Robert E. Tarjan,et al.  Three Partition Refinement Algorithms , 1987, SIAM J. Comput..

[12]  Arne Andersson,et al.  A new efficient radix sort , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[13]  Robert Paige,et al.  Efficient Translation of External Input in a Dynamically Typed Language , 1994, IFIP Congress.

[14]  Donald L. Shell,et al.  A high-speed sorting procedure , 1959, CACM.

[15]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[16]  Zhe Yang,et al.  High level reading and data structure compilation , 1997, POPL '97.

[17]  Kurt Mehlhorn,et al.  Sorting and Searching (Eatcs Monographs on Theoretical Computer Science) , 1984 .

[18]  Johan Jeuring,et al.  Polytypic Programming , 1996, Advanced Functional Programming.

[19]  Jens Palsberg,et al.  Efficient Type Matching , 2002, FoSSaCS.

[20]  Ralf Hinze,et al.  Generalizing generalized tries , 2000, Journal of Functional Programming.

[21]  Jon Louis Bentley,et al.  Programming pearls , 1987, CACM.

[22]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[23]  Robert Paige,et al.  Look ma, no hashing, and no arrays neither , 1991, POPL '91.

[24]  Yoav Zibin,et al.  Efficient algorithms for isomorphisms of simple types , 2003, POPL '03.

[25]  Gianni Franceschini,et al.  Radix Sorting with No Extra Space , 2007, ESA.

[26]  Robert Paige,et al.  Using Multiset Discrimination to Solve Language Processing Problems Without Hashing , 1995, Theor. Comput. Sci..

[27]  Michael L. Fredman,et al.  Surpassing the Information Theoretic Bound with Fusion Trees , 1993, J. Comput. Syst. Sci..

[28]  Arne Andersson,et al.  Implementing radixsort , 1998, JEAL.

[29]  Rajeev Raman,et al.  Sorting in Linear Time? , 1998, J. Comput. Syst. Sci..

[30]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .