Learning with Nearest Neighbour Classifiers

This paper introduces a learning strategy for designing a set of prototypes for a 1-nearest-neighbour (NN) classifier. In learning phase, we transform the 1-NN classifier into a maximum classifier whose discriminant functions use the nearest models of a mixture. Then the computation of the set of prototypes is viewed as a problem of estimating the centres of a mixture model. However, instead of computing these centres using standard procedures like the EM algorithm, we derive to compute a learning algorithm based on minimising the misclassification accuracy of the 1-NN classifier on the training set. One possible implementation of the learning algorithm is presented. It is based on the online gradient descent method and the use of radial gaussian kernels for the models of the mixture. Experimental results using hand-written NIST databases show the superiority of the proposed method over Kohonen's LVQ algorithms.

[1]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[2]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[4]  Sergio Bermejo,et al.  Finite-Sample Convergence Properties of the LVQ1 Algorithm and the Batch LVQ1 Algorithm , 2004, Neural Processing Letters.

[5]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[7]  Ryszard Tadeusiewicz,et al.  Neural networks: A comprehensive foundation: by Simon HAYKIN; Macmillan College Publishing, New York, USA; IEEE Press, New York, USA; IEEE Computer Society Press, Los Alamitos, CA, USA; 1994; 696 pp.; $69–95; ISBN: 0-02-352761-7 , 1995 .

[8]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[9]  Omid Omidvar,et al.  Neural Networks and Pattern Recognition , 1997 .

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Bernhard Schölkopf,et al.  Support vector learning , 1997 .

[12]  Yoshua Bengio,et al.  Artificial neural networks and their application to sequence recognition , 1991 .

[13]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[14]  Leo Breiman,et al.  HALF&HALF BAGGING AND HARD BOUNDARY POINTS , 1998 .

[15]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[16]  Charles L. Wilson,et al.  NIST form-based handprint recognition system (release 2.0) , 1997 .

[17]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[18]  M. C. Jones,et al.  Model-Free Curve Estimation , 1993 .

[19]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[20]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[21]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[22]  Anil K. Jain,et al.  Neural networks and pattern recognition , 1994 .

[23]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[24]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[25]  Bartlett W. Mel,et al.  How Receptive Field Parameters Affect Neural Learning , 1990, NIPS.

[26]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[27]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[28]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[29]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[30]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .