Efficient Algorithms for Similarity Search in Axis-Aligned Subspaces

Many applications — such as content-based image retrieval, subspace clustering, and feature selection — may benefit from efficient subspace similarity search. Given a query object, the goal of subspace similarity search is to retrieve the most similar objects from the database, where the similarity distance is defined over an arbitrary subset of dimensions (or features) — that is, an arbitrary axis-aligned projective subspace. Though much effort has been spent on similarity search in fixed subspaces, relatively little attention has been given to the problem of similarity search when the dimensions are specified at query time. In this paper, we propose several new methods for the subspace similarity search problem. Extensive experiments are provided showing very competitive performance relative to state-of-the-art solutions.

[1]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[2]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[3]  Christos Faloutsos,et al.  Fast Nearest Neighbor Search in Medical Image Databases , 1996, VLDB.

[4]  Hisashi Kashima,et al.  Generalized Expansion Dimension , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[5]  Michael E. Houle,et al.  Dimensional Testing for Multi-step Similarity Search , 2012, 2012 IEEE 12th International Conference on Data Mining.

[6]  Vasek Chvátal,et al.  The tail of the hypergeometric distribution , 1979, Discret. Math..

[7]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[8]  Matthijs Douze,et al.  Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[10]  Xiang Lian,et al.  Similarity Search in Arbitrary Subspaces Under Lp-Norm , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[12]  Hans-Peter Kriegel,et al.  Subspace similarity search using the ideas of ranking and top-k retrieval , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[13]  Arnold W. M. Smeulders,et al.  The Amsterdam Library of Object Images , 2004, International Journal of Computer Vision.

[14]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[15]  Hans-Peter Kriegel,et al.  Efficient Query Processing in Arbitrary Subspaces Using Vector Approximations , 2006, 18th International Conference on Scientific and Statistical Database Management (SSDBM'06).

[16]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[17]  Hans-Peter Kriegel,et al.  Subspace Similarity Search: Efficient k-NN Queries in Arbitrary Subspaces , 2010, SSDBM.

[18]  Marianne Winslett,et al.  Scientific and Statistical Database Management, 21st International Conference, SSDBM 2009, New Orleans, LA, USA, June 2-4, 2009, Proceedings , 2009, SSDBM.

[19]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..