Subspace similarity search using the ideas of ranking and top-k retrieval

There are abundant scenarios for applications of similarity search in databases where the similarity of objects is defined for a subset of attributes, i.e., in a subspace, only. While much research has been done in efficient support of single column similarity queries or of similarity queries in the full space, scarcely any support of similarity search in subspaces has been provided so far. The three existing approaches are variations of the sequential scan. Here, we propose the first index-based solution to subspace similarity search in arbitrary subspaces which is based on the concepts of nearest neighbor ranking and top-k retrieval.

[1]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[2]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[5]  Xiang Lian,et al.  Similarity Search in Arbitrary Subspaces Under Lp-Norm , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[7]  Ronald Fagin,et al.  Combining Fuzzy Information from Multiple Systems , 1999, J. Comput. Syst. Sci..

[8]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[9]  Martin L. Kersten,et al.  Efficient k-NN search on vertically decomposed data , 2002, SIGMOD '02.

[10]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[11]  Erkki Oja,et al.  Use of Image Subset Features in Image Retrieval with Self-Organizing Maps , 2004, CIVR.

[12]  Hans-Peter Kriegel,et al.  Efficient Query Processing in Arbitrary Subspaces Using Vector Approximations , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[13]  Surya Nepal,et al.  Query processing issues in image (multimedia) databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[14]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[15]  Moni Naor,et al.  Optimal aggregation algorithms for middleware , 2001, PODS '01.

[16]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[17]  Xiaofei He Incremental semi-supervised subspace learning for image retrieval , 2004, MULTIMEDIA '04.