Efficient Top-k Indexing via General Reductions

Let D be a set of n elements each associated with a real-valued weight, and Q be the set of all possible predicates allowed on those elements. Given a predicate in Q and integer k, a top-k query returns the k elements with the largest weights among the elements of D satisfying q. The corresponding data structure problem aims to store D in small space to allow every query to be answered efficiently. It is already known that, before settling the problem, one must be able to solve two degenerated accompanying problems: (i) prioritized reporting: given a predicate q ∈ Q and a real value τ, return all the elements of D satisfying q and having weights at least τ (ii) max reporting: top-k queries with k fixed to 1. In this paper we prove general reductions in external memory that explore the opposite direction. Our first reduction shows that, (under mild conditions) any prioritized reporting structure yields a static top-$k$ structure with only a slow-down in query time by a factor of O(logB n), where B is the block size. Our second reduction shows that if one additionally has a max reporting structure, then combining the two structures yields a top-k structure with no performance slow down (in space, query, and update) in expectation. These reductions significantly simplify the design of top-k structures, as we showcase on numerous problems including halfspace reporting, circular reporting, interval stabbing, point enclosure, and 3d dominance. All the techniques proposed work directly in the RAM model as well.

[1]  Leonidas J. Guibas,et al.  Fractional cascading: I. A data structuring technique , 1986, Algorithmica.

[2]  Boris Aronov,et al.  On approximating the depth and related problems , 2005, SODA '05.

[3]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[4]  Jeffrey Scott Vitter,et al.  Efficient searching with linear constraints , 1998, J. Comput. Syst. Sci..

[5]  Jeffrey Scott Vitter,et al.  Categorical range maxima queries , 2014, PODS.

[6]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[7]  Prosenjit Gupta,et al.  Colored top-K range-aggregate queries , 2013, Inf. Process. Lett..

[8]  Bernard Chazelle,et al.  Halfspace range search: an algorithmic application of K-sets , 1985, SCG '85.

[9]  Haim Kaplan,et al.  An Optimal Dynamic Data Structure for Stabbing-Semigroup Queries , 2012, SIAM J. Comput..

[10]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[11]  Moshe Lewenstein Orthogonal Range Searching for Text Indexing , 2013, Space-Efficient Data Structures, Streams, and Algorithms.

[12]  Saladi Rahul,et al.  Improved Bounds for Orthogonal Point Enclosure Query and Point Location in Orthogonal Subdivisions in ℝ3 , 2015, SODA.

[13]  Jeffrey Scott Vitter,et al.  Top-k Document Retrieval in External Memory , 2013, ESA.

[14]  Yufei Tao,et al.  On Top-k Range Reporting in 2D Space , 2015, PODS.

[15]  Timothy M. Chan,et al.  Optimal halfspace range reporting in three dimensions , 2009, SODA.

[16]  Yufei Tao Stabbing horizontal segments with vertical rays , 2012, SoCG '12.

[17]  Yufei Tao,et al.  Dynamic top-k range reporting in external memory , 2012, PODS '12.

[18]  S. Muthukrishnan,et al.  Efficient algorithms for document retrieval problems , 2002, SODA '02.

[19]  Robert E. Tarjan,et al.  Planar Point Location Using Persistent Search Trees a , 1989 .

[20]  Torben Hagerup,et al.  A Guided Tour of Chernoff Bounds , 1990, Inf. Process. Lett..

[21]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[22]  Norbert Zeh,et al.  Ordered and unordered top-K range reporting in large data sets , 2011, SODA '11.

[23]  Bernard Chazelle,et al.  The power of geometric duality , 1985, BIT Comput. Sci. Sect..

[24]  Alejandro López-Ortiz,et al.  Online Sorted Range Reporting , 2009, ISAAC.

[25]  Norbert Zeh,et al.  A general approach for cache-oblivious range reporting and approximate range counting , 2010, Comput. Geom..

[26]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[27]  Saladi Rahul,et al.  A General Technique for Top-$k$ Geometric Intersection Query Problems , 2014, IEEE Trans. Knowl. Data Eng..

[28]  Kasper Green Larsen,et al.  Higher-dimensional orthogonal range reporting and rectangle stabbing in the pointer machine model , 2012, SoCG '12.

[29]  Gonzalo Navarro,et al.  Top-k document retrieval in optimal time and linear space , 2012, SODA.

[30]  Yufei Tao,et al.  A dynamic I/O-efficient structure for one-dimensional top-k range reporting , 2012, PODS.

[31]  Ihab F. Ilyas,et al.  A survey of top-k query processing techniques in relational database systems , 2008, CSUR.

[32]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[33]  Wing-Kai Hon,et al.  Space-Efficient Frameworks for Top-k String Retrieval , 2014, J. ACM.

[34]  Peyman Afshani On Dominance Reporting in 3D , 2008, ESA.

[35]  Roberto Grossi,et al.  Rank-Sensitive Data Structures , 2005, SPIRE.

[36]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[37]  Hans-Peter Kriegel,et al.  Managing Intervals Efficiently in Object-Relational Databases , 2000, VLDB.

[38]  Gerth Stølting Brodal External Memory Three-Sided Range Reporting and Top-k Queries with Sublogarithmic Updates , 2016, STACS.