A new fast reduction technique based on binary nearest neighbor tree

Abstract The K-nearest neighbor ( KNN ) rule is one of the most useful supervised classification methods, and is widely used in many pattern classification applications due to its simplicity. However, it faces prohibitive computational and storage requirements when dealing with large datasets. A reasonable way of alleviating this problem is to extract a small representative subset from the original dataset without reducing the classification accuracy. This means the most internal patterns are removed and the boundary patterns that can contribute to better classification accuracy are retained. To achieve this purpose, a new algorithm based on binary tree technique and some reduction operations is presented. The key issues of the proposed algorithm are how to build binary nearest neighbor search tree and design reduction strategies to keep the high classification accuracy patterns. In particular, firstly, we utilize several tree control rules and KNN rule to build a binary nearest neighbor tree of each random pattern. Secondly, according to the node locations in each binary nearest neighbor tree and the strategies of selection and replacement, different kinds of patterns as prototypes are obtained, which are close to class boundary regions or locate in the interior regions, and some internal patterns are generated. Finally, experimental results show that the proposed algorithm effectively reduces the number of prototypes while maintaining the same level of classification accuracy as the traditional KNN algorithm and other prototype algorithms. Moreover, it is a simple and fast hybrid algorithm for prototype reduction.

[1]  Chi-Jen Lu,et al.  Adaptive Prototype Learning Algorithms: Theoretical and Experimental Studies , 2006, J. Mach. Learn. Res..

[2]  Francisco Herrera,et al.  Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study , 2003, IEEE Trans. Evol. Comput..

[3]  Francisco Herrera,et al.  Differential evolution for optimizing the positioning of prototypes in nearest neighbor classification , 2011, Pattern Recognit..

[4]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[5]  Kyoung-jae Kim Artificial neural networks with evolutionary instance selection for financial forecasting , 2006, Expert Syst. Appl..

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Nicolás García-Pedrajas,et al.  A divide-and-conquer recursive approach for scaling up instance selection algorithms , 2009, Data Mining and Knowledge Discovery.

[8]  I-Jing Li,et al.  A SOM-based dimensionality reduction method for KNN classifiers , 2010, 2010 International Conference on System Science and Engineering.

[9]  Jin Li,et al.  Feature evaluation and selection with cooperative game theory , 2012, Pattern Recognit..

[10]  Yiu-ming Cheung,et al.  Rival penalized self-organizing map , 2004, Neural Networks and Computational Intelligence.

[11]  Francesc J. Ferri,et al.  An efficient prototype merging strategy for the condensed 1-NN rule through class-conditional hierarchical clustering , 2002, Pattern Recognit..

[12]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[13]  Loris Nanni,et al.  A clustering method for automatic biometric template selection , 2006, Pattern Recognit..

[14]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[15]  Amir F. Atiya,et al.  A Novel Template Reduction Approach for the $K$-Nearest Neighbor Method , 2009, IEEE Transactions on Neural Networks.

[16]  Utpal Garain,et al.  Prototype reduction using an artificial immune model , 2008, Pattern Analysis and Applications.

[17]  Padraig Cunningham,et al.  A Taxonomy of Similarity Mechanisms for Case-Based Reasoning , 2009, IEEE Transactions on Knowledge and Data Engineering.

[18]  M. Narasimha Murty,et al.  An incremental prototype set building technique , 2002, Pattern Recognit..

[19]  Fabrizio Angiulli,et al.  Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[20]  A. Ghosh On Nearest Neighbor Classification Using Adaptive Choice of k , 2007 .

[21]  Chidchanok Lursinsap,et al.  A divide-and-conquer approach to the pairwise opposite class-nearest neighbor (POC-NN) algorithm , 2005, Pattern Recognit. Lett..

[22]  Yiu-ming Cheung,et al.  A new feature selection method for Gaussian mixture clustering , 2009, Pattern Recognit..

[23]  Yiu-ming Cheung,et al.  A Divide-and-Conquer Learning Approach to Radial Basis Function Networks , 2004, Neural Processing Letters.

[24]  José Francisco Martínez Trinidad,et al.  A new fast prototype selection method based on clustering , 2010, Pattern Analysis and Applications.

[25]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[26]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[27]  Yiu-ming Cheung,et al.  Rival-Model Penalized Self-Organizing Map , 2007, IEEE Transactions on Neural Networks.

[28]  John L. Casti,et al.  A new initial-value method for on-line filtering and estimation (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[29]  Hanan Samet,et al.  K-Nearest Neighbor Finding Using MaxNearestDist , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  José Salvador Sánchez,et al.  Decision boundary preserving prototype selection for nearest neighbor classification , 2005, Int. J. Pattern Recognit. Artif. Intell..