Using difficulty of prediction to decrease computation: fast sort, priority queue and convex hull on entropy bounded inputs

Studies have indicated that sorting comprises about 20% of all computing on mainframes. Perhaps the largest use of sorting in computing (particularly business computing) is the sort required for large database operations (e.g. required by joint operations). In these applications the keys are many words long. Since our sorting algorithm hashes the key (rather than compare entire keys as in comparison sorts such as quicksort), our algorithm is even more advantageous in the case of large key lengths; in that case the cutoff is much lower. In case that the compression ratio is high, which can be determined after building the dictionary, we just adopt the previous sorting algorithm, e.g. quick sort. The same techniques can be extended to other problems (e.g. computational geometry problems) to decrease computation by learning the distribution of the inputs.<<ETX>>

[1]  David A. Huffman,et al.  A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[2]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[3]  B. Pittel Asymptotical Growth of a Class of Random Trees , 1985 .

[4]  Mark R. Nelson,et al.  LZW data compression , 1989 .

[5]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[6]  Uzi Vishkin,et al.  On Parallel Hashing and Integer Sorting (Extended Summary) , 1990, ICALP.

[7]  Yossi Matias,et al.  Fast hashing on a PRAM—designing by expectation , 1991, SODA '91.

[8]  Peter M. McIlroy Optimistic sorting and information theoretic complexity , 1993, SODA '93.

[9]  Richard Cole,et al.  Parallel merge sort , 1988, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[10]  Jacob Ziv,et al.  Coding theorems for individual sequences , 1978, IEEE Trans. Inf. Theory.

[11]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[12]  Johan Håstad,et al.  Optimal bounds for decision problems on the CRCW PRAM , 1987, STOC.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  Rüdiger Reischuk Probabilistic Parallel Algorithms for Sorting and Selection , 1985, SIAM J. Comput..

[15]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[16]  Andrew Chi-Chih Yao,et al.  The complexity of searching an ordered random table , 1976, 17th Annual Symposium on Foundations of Computer Science (sfcs 1976).

[17]  Wojciech Szpankowski (Un)expected behavior of typical suffix trees , 1992, SODA '92.

[18]  Mark N. Wegman,et al.  Variations on a theme by Ziv and Lempel , 1985 .

[19]  Joseph JáJá,et al.  An Introduction to Parallel Algorithms , 1992 .

[20]  John H. Reif,et al.  An optimal parallel algorithm for integer sorting , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[21]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[22]  Wojciech Szpankowski A typical behavior of some data compression schemes , 1991, [1991] Proceedings. Data Compression Conference.

[23]  Ketan Mulmuley,et al.  Computational geometry : an introduction through randomized algorithms , 1993 .

[24]  Sanguthevar Rajasekaran,et al.  Optimal and Sublogarithmic Time Randomized Parallel Sorting Algorithms , 1989, SIAM J. Comput..

[25]  Quentin F. Stout,et al.  Ultra-fast expected time parallel algorithms , 1991, SODA '91.

[26]  Uzi Vishkin,et al.  Finding the Maximum, Merging, and Sorting in a Parallel Computation Model , 1981, J. Algorithms.

[27]  Prabhakar Raghavan,et al.  A Statistical Adversary for On-line Algorithms , 1991, On-Line Algorithms.

[28]  Torben Hagerup,et al.  Towards Optimal Parallel Bucket Sorting , 1987, Inf. Comput..

[29]  Peter van Emde Boas,et al.  Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space , 1977, Inf. Process. Lett..

[30]  Anna R. Karlin,et al.  Markov paging , 1992, Proceedings., 33rd Annual Symposium on Foundations of Computer Science.

[31]  Heikki Mannila,et al.  Measures of Presortedness and Optimal Sorting Algorithms , 1985, IEEE Transactions on Computers.

[32]  P. Krishnan,et al.  Optimal prefetching via data compression , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[33]  John H. Reif,et al.  Implementations of randomized sorting on large parallel machines , 1992, SPAA '92.

[34]  Leslie G. Valiant,et al.  A logarithmic time sort for linear size networks , 1982, STOC.

[35]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .