Faster Adaptive Set Intersections for Text Searching

The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and Lopez-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corroborate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search [4] by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algorithm.

[1]  Alon Itai,et al.  Interpolation search—a log logN search , 1978, CACM.

[2]  Erik D. Demaine,et al.  Adaptive set intersections, unions, and differences , 2000, SODA '00.

[3]  Frank K. Hwang,et al.  A Simple Algorithm for Merging Two Disjoint Linearly-Ordered Sets , 1972, SIAM J. Comput..

[4]  Erik D. Demaine,et al.  Interpolation search for non-independent data , 2004, SODA '04.

[5]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[6]  Claire Mathieu,et al.  Adaptive intersection and t-threshold problems , 2002, SODA '02.

[7]  Frank K. Hwang Optimal Merging of 3 Elements with n Elements , 1980, SIAM J. Comput..

[8]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[9]  Guy E. Blelloch,et al.  Compact representations of ordered sets , 2004, SODA '04.

[10]  Ricardo A. Baeza-Yates,et al.  Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences , 2005, SPIRE.

[11]  Andrew Chi-Chih Yao,et al.  An Almost Optimal Algorithm for Unbounded Searching , 1976, Inf. Process. Lett..

[12]  Derick Wood,et al.  A survey of adaptive sorting algorithms , 1992, CSUR.

[13]  Gaston H. Gonnet,et al.  An algorithmic and complexity analysis of interpolation search , 2004, Acta Informatica.

[14]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[15]  Frank K. Hwang,et al.  Optimal merging of 2 elements with n elements , 2004, Acta Informatica.

[16]  Erik D. Demaine,et al.  Experiments on Adaptive Set Intersections for Text Retrieval Systems , 2001, ALENEX.