New rank methods for reducing the size of the training set using the nearest neighbor rule

Some new rank methods to select the best prototypes from a training set are proposed in this paper in order to establish its size according to an external parameter, while maintaining the classification accuracy. The traditional methods that filter the training set in a classification task like editing or condensing have some rules that apply to the set in order to remove outliers or keep some prototypes that help in the classification. In our approach, new voting methods are proposed to compute the prototype probability and help to classify correctly a new sample. This probability is the key to sorting the training set out, so a relevance factor from 0 to 1 is used to select the best candidates for each class whose accumulated probabilities are less than that parameter. This approach makes it possible to select the number of prototypes necessary to maintain or even increase the classification accuracy. The results obtained in different high dimensional databases show that these methods maintain the final error rate while reducing the size of the training set.

[1]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[2]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[3]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[4]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[5]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[6]  Francesc J. Ferri,et al.  Considerations about sample-size sensitivity of a family of edited nearest-neighbor rules , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[7]  Belur V. Dasarathy,et al.  Nearest Neighbour Editing and Condensing Tools–Synergy Exploitation , 2000, Pattern Analysis & Applications.

[8]  Herbert Freeman,et al.  On the Encoding of Arbitrary Geometric Configurations , 1961, IRE Trans. Electron. Comput..

[9]  G. Gates The Reduced Nearest Neighbor Rule , 1998 .

[10]  Weidong Zhang,et al.  New prototype selection rule integrated condensing with editing process for the nearest neighbor rules , 2005, 2005 IEEE International Conference on Industrial Technology.

[11]  Hugh B. Woodruff,et al.  An algorithm for a selective nearest neighbor decision rule (Corresp.) , 1975, IEEE Trans. Inf. Theory.

[12]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[13]  G. Gates,et al.  The reduced nearest neighbor rule (Corresp.) , 1972, IEEE Trans. Inf. Theory.

[14]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[15]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.