Fast Intersection Algorithms for Sorted Sequences

This paper presents and analyzes a simple intersection algorithm for sorted sequences that is fast on average. It is related to the multiple searching problem and to merging. We present the worst and average case analysis, showing that in the former, the complexity nicely adapts to the smallest list size. In the latter case, it performs less comparisons than the total number of elements on both inputs, n and m, when n=αm (α>1), achieving O(m log(n/m)) complexity. The algorithm is motivated by its application to fast query processing in Web search engines, where large intersections, or differences, must be performed fast. In this case we experimentally show that the algorithm is faster than previous solutions.

[1]  Ricardo A. Baeza-Yates,et al.  Experimental Analysis of a Fast Intersection Algorithm for Sorted Sequences , 2005, SPIRE.

[2]  Alejandro López-Ortiz,et al.  An experimental investigation of set intersection algorithms for text searching , 2010, JEAL.

[3]  Claire Mathieu,et al.  Adaptive intersection and t-threshold problems , 2002, SODA '02.

[4]  Gregory J. E. Rawlins Compared to what? - an introduction to the analysis of algorithms , 1992, Principles of computer science series.

[5]  Ricardo A. Baeza-Yates,et al.  A Three Level Search Engine Index Based in Query Log Distribution , 2003, SPIRE.

[6]  Erik D. Demaine,et al.  Adaptive set intersections, unions, and differences , 2000, SODA '00.

[7]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[8]  Alan M. Frieze,et al.  Average-Case Analysis of the Merging Algorithm of Hwang and Lin , 1998, Algorithmica.

[9]  Ricardo Baeza-Yates,et al.  Efficient text searching , 1989 .

[10]  Ricardo A. Baeza-Yates,et al.  A Fast Set Intersection Algorithm for Sorted Sequences , 2004, CPM.

[11]  Kurt Mehlhorn,et al.  Lower bounds for set intersection queries , 1993, SODA '93.

[12]  Anthony Scime Web Mining: Applications and Techniques , 2004 .

[13]  Sampath Kannan,et al.  Two Probabilistic Results on Merging , 1993, SIAM J. Comput..

[14]  J. Shane Culpepper,et al.  Compact Set Representation for Information Retrieval , 2007, SPIRE.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Erik D. Demaine,et al.  Experiments on Adaptive Set Intersections for Text Retrieval Systems , 2001, ALENEX.

[17]  Peter Sanders,et al.  Intersection in Integer Inverted Indices , 2007, ALENEX.

[18]  Frank K. Hwang,et al.  A Simple Algorithm for Merging Two Disjoint Linearly-Ordered Sets , 1972, SIAM J. Comput..

[19]  Ricardo Baeza-Yates,et al.  Web Usage Mining in Search Engines , 2005 .

[20]  Richard J. Lipton,et al.  On the Complexity of Computations under Varying Sets of Primitives , 1975, J. Comput. Syst. Sci..

[21]  Andrew Chi-Chih Yao,et al.  An Almost Optimal Algorithm for Unbounded Searching , 1976, Inf. Process. Lett..