A new fast search algorithm for exact k-nearest neighbors based on optimal triangle-inequality-based check strategy

Abstract The k-nearest neighbor (KNN) algorithm has been widely used in pattern recognition, regression, outlier detection and other data mining areas. However, it suffers from the large distance computation cost, especially when dealing with big data applications. In this paper, we propose a new fast search (FS) algorithm for exact k-nearest neighbors based on optimal triangle-inequality-based (OTI) check strategy. During the procedure of searching exact k-nearest neighbors for any query, the OTI check strategy can eliminate more redundant distance computations for the instances located in the marginal area of neighboring clusters compared with the original TI check strategy. Considering the large space complexity and extra time complexity of OTI, we also propose an efficient optimal triangle-inequality-based (EOTI) check strategy. The experimental results demonstrate that our proposed two algorithms (OTI and EOTI) achieve the best performance compared with other related KNN fast search algorithms, especially in the case of dealing with high-dimensional datasets.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Bernard De Baets,et al.  Large-scale distance metric learning for k-nearest neighbors regression , 2016, Neurocomputing.

[3]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[4]  Juan Romo,et al.  Data learning from big data , 2018 .

[5]  Song B. Park,et al.  A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Piotr Indyk,et al.  Nearest Neighbors in High-Dimensional Spaces , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[7]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Robert F. Sproull,et al.  Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[9]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[10]  Srinivasan Parthasarathy,et al.  Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.

[11]  Yidi Wang,et al.  A new k-harmonic nearest neighbor classifier based on the multi-local means , 2017, Expert Syst. Appl..

[12]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[13]  Zahir Tari,et al.  kNNVWC: An Efficient k-Nearest Neighbors Approach Based on Various-Widths Clustering , 2016, IEEE Trans. Knowl. Data Eng..

[14]  Yidi Wang,et al.  A new general nearest neighbor classification based on the mutual neighborhood information , 2017, Knowl. Based Syst..

[15]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[16]  Haibo He,et al.  A local density-based approach for outlier detection , 2017, Neurocomputing.

[17]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[18]  Clara Pizzuti,et al.  Distance-based detection and prediction of outliers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[19]  Yang Wang,et al.  A New Cell-Level Search Based Non-Exhaustive Approximate Nearest Neighbor (ANN) Search Algorithm in the Framework of Product Quantization , 2019, IEEE Access.

[20]  Jiye Liang,et al.  An efficient instance selection algorithm for k nearest neighbor regression , 2017, Neurocomputing.

[21]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[22]  Chuan Zhou,et al.  Parameter k search strategy in outlier detection , 2018, Pattern Recognit. Lett..

[23]  Chun Chen,et al.  Scalable Image Retrieval by Sparse Product Quantization , 2016, IEEE Transactions on Multimedia.