Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces

There are abundant scenarios for applications of similarity search in databases where the similarity of objects is defined for a subset of attributes, i.e., in a subspace, only. While much research has been done in efficient support of single column similarity queries or of similarity queries in the full space, scarcely any support of similarity search in subspaces has been provided so far. The three existing approaches are variations of the sequential scan. Here, we propose the first index-based solution to subspace similarity search in arbitrary subspaces.

[1]  Hans-Peter Kriegel,et al.  Subspace similarity search using the ideas of ranking and top-k retrieval , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[2]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[3]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[4]  Xiaofei He Incremental semi-supervised subspace learning for image retrieval , 2004, MULTIMEDIA '04.

[5]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[6]  Shin'ichi Satoh,et al.  Distinctiveness-sensitive nearest-neighbor search for efficient similarity retrieval of multimedia information , 2001, Proceedings 17th International Conference on Data Engineering.

[7]  Sanjeev Khanna,et al.  Why and Where: A Characterization of Data Provenance , 2001, ICDT.

[8]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9]  Xiang Lian,et al.  Similarity Search in Arbitrary Subspaces Under Lp-Norm , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[10]  Hans-Peter Kriegel,et al.  Efficient Query Processing in Arbitrary Subspaces Using Vector Approximations , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[11]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[12]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[13]  Philip S. Yu,et al.  On High Dimensional Indexing of Uncertain Data , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[14]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS.

[15]  Christian Böhm,et al.  Independent quantization: an index compression technique for high-dimensional data spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[16]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17]  Malcolm P. Atkinson,et al.  Issues Raised by Three Years of Developing PJama: An Orthogonally Persistent Platform for Java , 1999, ICDT.

[18]  Beng Chin Ooi,et al.  An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[19]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[20]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[21]  Jens Dittrich,et al.  Dwarfs in the rearview mirror: how big are they really? , 2008, Proc. VLDB Endow..

[22]  Kristin P. Bennett,et al.  Density-based indexing for approximate nearest-neighbor queries , 1999, KDD '99.

[23]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[24]  Wei-Ying Ma,et al.  Image and Video Retrieval , 2003, Lecture Notes in Computer Science.

[25]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[26]  Charu C. Aggarwal,et al.  Re-designing distance functions and distance-based applications for high dimensional data , 2001, SGMD.

[27]  Erkki Oja,et al.  Use of Image Subset Features in Image Retrieval with Self-Organizing Maps , 2004, CIVR.