Significant-Presence Range Queries in Categorical Data

In traditional colored range-searching problems, one wants to store a set of n objects with m distinct colors for the following queries: report all colors such that there is at least one object of that color intersecting the query range. Such an object, however, could be an ‘outlier’ in its color class. Therefore we consider a variant of this problem where one has to report only those colors such that at least a fraction τ of the objects of that color intersects the query range, for some parameter τ. Our main results are on an approximate version of this problem, where we are also allowed to report those colors for which a fraction (1 - e)τ intersects the query range, for some fixed e> 0. We present efficient data structures for such queries with orthogonal query ranges in sets of colored points, and for point stabbing queries in sets of colored rectangles.