Scalable k-NN graph construction for visual descriptors

The k-NN graph has played a central role in increasingly popular data-driven techniques for various learning and vision tasks; yet, finding an efficient and effective way to construct k-NN graphs remains a challenge, especially for large-scale high-dimensional data. In this paper, we propose a new approach to construct approximate k-NN graphs with emphasis in: efficiency and accuracy. We hierarchically and randomly divide the data points into subsets and build an exact neighborhood graph over each subset, achieving a base approximate neighborhood graph; we then repeat this process for several times to generate multiple neighborhood graphs, which are combined to yield a more accurate approximate neighborhood graph. Furthermore, we propose a neighborhood propagation scheme to further enhance the accuracy. We show both theoretical and empirical accuracy and efficiency of our approach to k-NN graph construction and demonstrate significant speed-up in dealing with large scale visual data.

[1]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[2]  Leonidas J. Guibas,et al.  Image webs: Computing and exploiting connectivity in image collections , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Sanjoy Dasgupta,et al.  Which Spatial Partition Trees are Adaptive to Intrinsic Dimension? , 2009, UAI.

[4]  Jia Deng,et al.  A large-scale hierarchical image database , 2009, CVPR 2009.

[5]  Andrew Zisserman,et al.  Geometric Latent Dirichlet Allocation on a Matching Graph for Large-scale Image Datasets , 2011, International Journal of Computer Vision.

[6]  C. Lanczos An iteration method for the solution of the eigenvalue problem of linear differential and integral operators , 1950 .

[7]  Hakim Hacid,et al.  Incremental Neighborhood Graphs Construction for Multidimensional Databases Indexing , 2007, Canadian Conference on AI.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Pravin M. Vaidya,et al.  AnO(n logn) algorithm for the all-nearest-neighbors Problem , 1989, Discret. Comput. Geom..

[14]  Vladimir Rokhlin,et al.  Randomized approximate nearest neighbors algorithm , 2011, Proceedings of the National Academy of Sciences.

[15]  Hongbin Zha,et al.  Optimizing kd-trees for scalable visual descriptor indexing , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Kristen Grauman,et al.  Kernelized locality-sensitive hashing for scalable image search , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[17]  Zhuowen Tu,et al.  Scalable Neighborhood Graph Construction , 2011 .

[18]  Pasi Fränti,et al.  Divide-and-conquer algorithm for creating neighborhood graph for clustering , 2004, ICPR 2004.

[19]  Andrew Zisserman,et al.  Object Mining Using a Matching Graph on Very Large Image Collections , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[20]  Helen C. Shen,et al.  Linear Neighborhood Propagation and Its Applications , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[22]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[23]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[24]  Sanjoy Dasgupta,et al.  Random projection trees and low dimensional manifolds , 2008, STOC.

[25]  Gonzalo Navarro,et al.  Practical Construction of k-Nearest Neighbor Graphs in Metric Spaces , 2006, WEA.

[26]  Kenneth L. Clarkson,et al.  Fast algorithms for the all nearest neighbors problem , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[27]  Ulrike von Luxburg,et al.  Cluster Identification in Nearest-Neighbor Graphs , 2007, ALT.

[28]  Benjamin B. Kimia,et al.  Metric-based shape retrieval in large databases , 2002, Object recognition supported by user interaction for service robots.

[29]  Sunil Arya,et al.  Approximate nearest neighbor queries in fixed dimensions , 1993, SODA '93.

[30]  Jian Sun,et al.  A rank-order distance based clustering algorithm for face tagging , 2011, CVPR 2011.

[31]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[32]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[33]  Jiri Matas,et al.  Large-Scale Discovery of Spatially Related Images , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[35]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[36]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[37]  Takeaki Uno,et al.  Efficient Construction of Neighborhood Graphs by the Multiple Sorting Method , 2009, ArXiv.

[38]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[39]  Ira Kemelmacher-Shlizerman,et al.  Exploring photobios , 2011, SIGGRAPH 2011.

[40]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[41]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[42]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[43]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[44]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[45]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[46]  Jon Louis Bentley,et al.  The Complexity of Finding Fixed-Radius Near Neighbors , 1977, Inf. Process. Lett..

[47]  Piyush Kumar,et al.  Fast construction of k-nearest neighbor graphs for point clouds , 2010, IEEE Transactions on Visualization and Computer Graphics.