Query-specific signature selection for efficient k-nearest neighbour approximation

Finding k-nearest neighbours (k-NN) is one of the most important primitives of many applications such as search engines and recommendation systems. However, its computational cost is extremely high when searching for k-NN points in a huge collection of high-dimensional points. Locality-sensitive hashing (LSH) has been introduced for an efficient k-NN approximation, but none of the existing LSH approaches clearly outperforms others. We propose a novel LSH approach, Signature Selection LSH (S2LSH), which finds approximate k-NN points very efficiently in various datasets. It first constructs a large pool of highly diversified signature regions with various sizes. Given a query point, it dynamically generates a query-specific signature region by merging highly effective signature regions selected from the signature pool. We also suggest S2LSH-M, a variant of S2LSH, which processes multiple queries more efficiently by using query-specific features and optimization techniques. Extensive experiments show the performance superiority of our approaches in diverse settings.

[1]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[2]  Ulrike von Luxburg,et al.  Consistent Procedures for Cluster Tree Estimation and Pruning , 2014, IEEE Transactions on Information Theory.

[3]  George Tzanetakis,et al.  MARSYAS: a framework for audio analysis , 1999, Organised Sound.

[4]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[5]  Jing Wang,et al.  Scalable k-NN graph construction for visual descriptors , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ali Hamzeh,et al.  SPCF: a stepwise partitioning for collaborative filtering to alleviate sparsity problems , 2012, J. Inf. Sci..

[7]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[8]  Sang-goo Lee,et al.  A Fast k-Nearest Neighbor Search Using Query-Specific Signature Selection , 2015, CIKM.

[9]  Aytug Onan,et al.  Classifier and feature set ensembles for web page classification , 2016, J. Inf. Sci..

[10]  Pasi Fränti,et al.  Fast Agglomerative Clustering Using a k-Nearest Neighbor Graph , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[12]  Kai Li,et al.  Image similarity search with compact data structures , 2004, CIKM '04.

[13]  Wilfred Ng,et al.  Locality-sensitive hashing scheme based on dynamic collision counting , 2012, SIGMOD Conference.

[14]  Szymon Rusinkiewicz,et al.  Rotation Invariant Spherical Harmonic Representation of 3D Shape Descriptors , 2003, Symposium on Geometry Processing.

[15]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[16]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[17]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[18]  Shih-Fu Chang,et al.  Spherical Hashing: Binary Code Embedding with Hyperspheres , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[21]  Yu He,et al.  The YouTube video recommendation system , 2010, RecSys '10.

[22]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Heasoo Hwang,et al.  A novel algorithm for scalable k-nearest neighbour graph construction , 2016, J. Inf. Sci..

[24]  Panos Kalnis,et al.  Quality and efficiency in high dimensional nearest neighbor search , 2009, SIGMOD Conference.

[25]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[26]  Kaizhu Huang,et al.  Fast kNN Graph Construction with Locality Sensitive Hashing , 2013, ECML/PKDD.

[27]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[28]  Özgür Ulusoy,et al.  Cluster searching strategies for collaborative recommendation systems , 2013, Inf. Process. Manag..

[29]  Raymond Y. K. Lau,et al.  A comparative study of two automatic document classification methods in a library setting , 2008, J. Inf. Sci..

[30]  Reza Rafeh,et al.  An adaptive approach to dealing with unstable behaviour of users in collaborative filtering systems , 2012, J. Inf. Sci..

[31]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[32]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[33]  Sang-goo Lee,et al.  Reversed CF: A fast collaborative filtering algorithm using a k-nearest neighbor graph , 2015, Expert Syst. Appl..

[34]  Young U. Ryu,et al.  A group recommendation system for online communities , 2010, Int. J. Inf. Manag..