kNNVWC: An efficient k-nearest neighbours approach based on Various-Widths Clustering

In this paper, a novel k-NN approach based on Various-Widths Clustering, named kNNVWC, is proposed to efficiently find k-NNs for a query object from a given data set. kNNVWC does clustering using various widths, where a data set is clustered with a global width first and each produced cluster that meets the predefined criteria is recursively clustered with its own local width that suits its distribution. Experimental results demonstrate that kNNVWC performs well compared to state-ofart of k-NN search algorithms.

[1]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[2]  Forest Baskett,et al.  An Algorithm for Finding Nearest Neighbors , 1975, IEEE Transactions on Computers.

[3]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[4]  Song B. Park,et al.  A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[6]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[7]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[8]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[9]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[10]  BozkayaTolga,et al.  Distance-based indexing for high-dimensional metric spaces , 1997 .

[11]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[12]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[13]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[14]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[15]  Ada Wai-Chee Fu,et al.  Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances , 2000, The VLDB Journal.

[16]  Christos Faloutsos,et al.  Slim-Trees: High Performance Metric Trees Minimizing Overlap Between Nodes , 2000, EDBT.

[17]  Leonid Portnoy,et al.  Intrusion detection with unlabeled data using clustering , 2000 .

[18]  Salvatore J. Stolfo,et al.  A Geometric Framework for Unsupervised Anomaly Detection , 2002, Applications of Data Mining in Computer Security.

[19]  Andrew W. Moore,et al.  New Algorithms for Efficient High-Dimensional Nonparametric Classification , 2006, J. Mach. Learn. Res..

[20]  Stephen D. Bay,et al.  Mining distance-based outliers in near linear time with randomization and a simple pruning rule , 2003, KDD '03.

[21]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[22]  Christopher Leckie,et al.  Adaptive Clustering for Network Intrusion Detection , 2004, PAKDD.

[23]  Robert F. Sproull,et al.  Refinements to nearest-neighbor searching ink-dimensional trees , 1991, Algorithmica.

[24]  Mayank Bawa,et al.  LSH forest: self-tuning indexes for similarity search , 2005, WWW '05.

[25]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[26]  Clara Pizzuti,et al.  Distance-based detection and prediction of outliers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[27]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[28]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[29]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Shree K. Nayar,et al.  What Is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images? , 2008, ECCV.

[31]  Srinivasan Parthasarathy,et al.  Fast mining of distance-based outliers in high-dimensional datasets , 2008, Data Mining and Knowledge Discovery.

[32]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision , 2008, IEEE Trans. Neural Networks.

[33]  S. Magnussen,et al.  Model-based mean square error estimators for k-nearest neighbour predictions and applications using remotely sensed data for forest inventories , 2009 .

[34]  Stephen M. Omohundro,et al.  Five Balltree Construction Algorithms , 2009 .

[35]  Fabrizio Angiulli,et al.  DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets , 2009, TKDD.

[36]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[37]  Q. Henry Wu,et al.  Power Transformer Fault Classification Based on Dissolved Gas Analysis by Implementing Bootstrap and Genetic Programming , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[38]  R. M. Chandrasekaran,et al.  Evaluation of k-Nearest Neighbor classifier performance for direct marketing , 2010, Expert Syst. Appl..

[39]  Marimuthu Palaniswami,et al.  Labelled data collection for anomaly detection in wireless sensor networks , 2010, 2010 Sixth International Conference on Intelligent Sensors, Sensor Networks and Information Processing.

[40]  Xueyi Wang,et al.  A fast exact k-nearest neighbors algorithm for high dimensional search using k-means clustering and triangle inequality , 2011, The 2011 International Joint Conference on Neural Networks.

[41]  Shankar Vembu,et al.  Chemical gas sensor drift compensation using classifier ensembles , 2012 .

[42]  Zahir Tari,et al.  SCADAVT-A framework for SCADA security testbed based on virtualization technology , 2013, 38th Annual IEEE Conference on Local Computer Networks.

[43]  Xinghuo Yu,et al.  An unsupervised anomaly-based detection approach for integrity attacks on SCADA systems , 2014, Comput. Secur..

[44]  Alexander Vergara,et al.  On the calibration of sensor arrays for pattern recognition using the minimal number of experiments , 2014 .

[45]  E. Eskin,et al.  Unsupervised Anomaly Detection Using an Optimized K-Nearest Neighbors Algorithm , .