Fast Online k-nn Graph Building

In this paper we propose an online approximate k-nn graph building algorithm, which is able to quickly update a k-nn graph using a flow of data points. One very important step of the algorithm consists in using the current distributed graph to search for the neighbors of a new node. Hence we also propose a distributed partitioning method based on balanced k-medoids clustering, that we use to optimize the distributed search process. Finally, we present the improved sequential search procedure that is used inside each partition. We also perform an experimental evaluation of the different algorithms, where we study the influence of the parameters and compare the result of our algorithms to existing state of the art. This experimental evaluation confirms that the fast online k-nn graph building algorithm produces a graph that is highly similar to the graph produced by an offline exhaustive algorithm, while it requires less similarity computations.

[1]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[2]  A. Banerjee,et al.  Frequency Sensitive Competitive Learning for Balanced Clustering on High-dimensional Hyperspheres , 2004 .

[3]  Amir H. Payberah,et al.  JA-BE-JA: A Distributed Algorithm for Balanced Graph Partitioning , 2013, 2013 IEEE 7th International Conference on Self-Adaptive and Self-Organizing Systems.

[4]  Piyush Kumar,et al.  Fast construction of k-nearest neighbor graphs for point clouds , 2010, IEEE Transactions on Visualization and Computer Graphics.

[5]  Tim Althoff,et al.  Balanced Clustering for Content-based Image Browsing , 2011, Informatiktage.

[6]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[7]  Cheng-Lin Liu,et al.  Approximate kNN graph construction with locality sensitive hashing , 2013 .

[8]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[9]  Gabriel Kliot,et al.  Streaming graph partitioning for large distributed graphs , 2012, KDD.

[10]  Ada Wai-Chee Fu,et al.  Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances , 2000, The VLDB Journal.

[11]  Hae-Sang Park,et al.  A simple and fast algorithm for K-medoids clustering , 2009, Expert Syst. Appl..

[12]  Laura Ricci,et al.  Balanced Graph Partitioning with Apache Spark , 2014, Euro-Par Workshops.

[13]  Yasin Abbasi-Yadkori,et al.  Fast Approximate Nearest-Neighbor Search with k-Nearest Neighbor Graph , 2011, IJCAI.

[14]  Joydeep Ghosh,et al.  Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres , 2004, IEEE Transactions on Neural Networks.

[15]  Pasi Fränti,et al.  Balanced K-Means for Clustering , 2014, S+SSPR.

[16]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.