Near-Optimal Range Reporting Structures for Categorical Data

Range reporting on categorical (or colored) data is a well-studied generalization of the classical range reporting problem in which each of the N input points has an associated color (category). A query then asks to report the set of colors of the points in a given rectangular query range, which may be far smaller than the set of all points in the query range. We study two-dimensional categorical range reporting in both the word-RAM and I/O-model. For the I/O-model, we present two alternative data structures for three-sided queries. The first answers queries in optimal O(lgB N + K/B) I/Os using O(N lg* N) space, where K is the number of distinct colors in the output, B is the disk block size, and lg* N is the iterated logarithm of N. Our second data structure uses linear space and answers queries in O(lgB N + lg(h) N + K/B) I/Os for any constant integer h ≥ 1. Here lg(1) N = lg N and lg(h) N = lg(lg(h−1) N) when h > 1. Both solutions use only comparisons on the coordinates. We also show that the lgB N terms in the query costs can be reduced to optimal lg lgB U when the input points lie on a U x U grid and we allow word-level manipulations of the coordinates. We further reduce the query time to just O(1) if the points are given on an N x N grid. Both solutions also lead to improved data structures for four-sided queries. For the word-RAM, we obtain optimal data structures for three-sided range reporting, as well as improved upper bounds for four-sided range reporting. Finally, we show a tight lower bound on one-dimensional categorical range counting using an elegant reduction from (standard) two-dimensional range counting.

[1]  Kasper Green Larsen,et al.  Orthogonal range reporting: query lower bounds, optimal structures in 3-d, and higher-dimensional improvements , 2010, SCG.

[2]  Kasper Green Larsen,et al.  I/O-efficient data structures for colored range and prefix reporting , 2012, SODA.

[3]  Joseph JáJá,et al.  Optimal and near-optimal algorithms for generalized intersection reporting on pointer machines , 2005, Inf. Process. Lett..

[4]  Timothy M. Chan,et al.  Orthogonal range searching on the RAM, revisited , 2011, SoCG '11.

[5]  Virginia Vassilevska Williams,et al.  Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[6]  Ke Yi,et al.  Optimal External Memory Planar Point Enclosure , 2004, ESA.

[7]  Daniel P. Miranker,et al.  On a model of indexability and its bounds for range queries , 2002, JACM.

[8]  Yakov Nekrich Space-efficient range reporting for categorical data , 2012, PODS '12.

[9]  Haim Kaplan,et al.  Counting colors in boxes , 2007, SODA '07.

[10]  Bernhard Seeger,et al.  An asymptotically optimal multiversion B-tree , 1996, The VLDB Journal.

[11]  Mikkel Thorup,et al.  Time-space trade-offs for predecessor search , 2006, STOC '06.

[12]  Panayiotis Bozanis,et al.  New Results on Intersection Query Problems , 1997, Comput. J..

[13]  Pankaj K. Agarwal,et al.  Range Searching in Categorical Data: Colored Range Searching on Grid , 2002, ESA.

[14]  Kasper Green Larsen Higher Cell Probe Lower Bounds for Evaluating Polynomials , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[15]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[16]  Panayiotis Bozanis,et al.  New Upper Bounds for Generalized Intersection Searching Problems , 1995, ICALP.

[17]  Bernard Chazelle,et al.  Lower bounds for orthogonal range searching: I. The reporting case , 1990, JACM.

[18]  Jeffrey Scott Vitter,et al.  On two-dimensional indexability and optimal range search indexing , 1999, PODS '99.

[19]  Mihai Patrascu Lower bounds for 2-dimensional range counting , 2007, STOC '07.

[20]  Stephen Alstrup,et al.  New data structures for orthogonal range searching , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[21]  Mario A. López,et al.  Generalized intersection searching problems , 1993, Int. J. Comput. Geom. Appl..

[22]  Michael L. Fredman,et al.  Surpassing the Information Theoretic Bound with Fusion Trees , 1993, J. Comput. Syst. Sci..

[23]  Joseph JáJá,et al.  Space-Efficient and Fast Algorithms for Multidimensional Dominance Reporting and Counting , 2004, ISAAC.

[24]  Michiel H. M. Smid,et al.  Further Results on Generalized Intersection Searching Problems: Counting, Reporting, and Dynamization , 1995, J. Algorithms.