Indexing High Dimensional Rectangles for Fast Multimedia Identification

This paper addresses the problem of quickly performing point queries against high-dimensional regions. Such queries are useful in the increasingly important problems of multimedia identification and retrieval, where different database entries have different metrics for similarity. While the database literature has focused on indexing for high-dimensional nearest neighbor and epsilon range queries, indexing for point queries against high-dimensional regions has not been addressed. We present an efficient indexing method for these queries, which relies on the combination of redundancy and bit vector indexing to achieve significant performance gains. We have implemented our approach in a real-world audio fingerprinting system, and have obtained a factor of 56 speed-up over linear scan. Furthermore, the well-known Hilbert bulk-loaded R-Trees, a technique capable of searching low-dimensional regions, are shown to be ineffective in our audio fingerprinting system, because of the inherently high-dimensional properties of the problem.

[1]  Jeffrey F. Naughton,et al.  Avi Pfeffer: Generalized Search Trees for Database Systems , 1995, VLDB 1995.

[2]  Stefan Berchtold,et al.  Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data , 1996, Very Large Data Bases Conference.

[3]  John C. Platt,et al.  Extracting noise-robust features from audio data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Patrick E. O'Neil,et al.  Model 204 Architecture and Performance , 1987, HPTS.

[5]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[6]  John C. Platt,et al.  Distortion discriminant analysis for audio fingerprinting , 2003, IEEE Trans. Speech Audio Process..

[7]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[8]  Hans-Peter Kriegel,et al.  Object-Relational Indexing for General Interval Relationships , 2001, SSTD.

[9]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[10]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[11]  Anil K. Jain,et al.  On-line signature verification, , 2002, Pattern Recognit..

[12]  Shih-Fu Chang,et al.  Image Retrieval: Current Techniques, Promising Directions, and Open Issues , 1999, J. Vis. Commun. Image Represent..

[13]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[14]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[15]  Robert P. W. Duin,et al.  Uniform Object Generation for Optimizing One-class Classifiers , 2002, J. Mach. Learn. Res..

[16]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[17]  Christos Faloutsos,et al.  Deflating the dimensionality curse using multiple fractal dimensions , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[19]  Jonathan Goldstein,et al.  Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches , 2000, VLDB.

[20]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[21]  David P. Dobkin,et al.  A search engine for 3D models , 2003, TOGS.

[22]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[23]  Herbert Edelsbrunner,et al.  On the Intersection of Orthogonal Objects , 1981, Inf. Process. Lett..