Soft nearest prototype classification

We propose a new method for the construction of nearest prototype classifiers which is based on a Gaussian mixture ansatz and which can be interpreted as an annealed version of learning vector quantization (LVQ). The algorithm performs a gradient descent on a cost-function minimizing the classification error on the training set. We investigate the properties of the algorithm and assess its performance for several toy data sets and for an optical letter classification task. Results show 1) that annealing in the dispersion parameter of the Gaussian kernels improves classification accuracy; 2) that classification results are better than those obtained with standard learning vector quantization (LVQ 2.1, LVQ 3) for equal numbers of prototypes; and 3) that annealing of the width parameter improved the classification capability. Additionally, the principled approach provides an explanation of a number of features of the (heuristic) LVQ methods.

[1]  Biing-Hwang Juang,et al.  Discriminative learning for minimum error classification [pattern recognition] , 1992, IEEE Trans. Signal Process..

[2]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[3]  Biing-Hwang Juang,et al.  New discriminative training algorithms based on the generalized probabilistic descent method , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.

[4]  K. Obermayer,et al.  PHASE TRANSITIONS IN STOCHASTIC SELF-ORGANIZING MAPS , 1997 .

[5]  Shigeru Katagiri,et al.  Prototype-based minimum classification error/generalized probabilistic descent training for various speech units , 1994, Comput. Speech Lang..

[6]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[7]  Teuvo Kohonen,et al.  Improved versions of learning vector quantization , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[8]  Shigeru Katagiri,et al.  Application of a generalized probabilistic descent method to dynamic time warping-based speech recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Klaus Obermayer,et al.  Gaussian process regression: active data selection and test point rejection , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[11]  Léon Bottou,et al.  On-line learning and stochastic approximations , 1999 .

[12]  Nanda Kambhatla,et al.  Classifying with Gaussian Mixtures and Clusters , 1994, NIPS.

[13]  L. Eon Bottou Online Learning and Stochastic Approximations , 1998 .

[14]  Jorma Laaksonen,et al.  LVQ_PAK: The Learning Vector Quantization Program Package , 1996 .

[15]  Kenneth Rose,et al.  A global optimization technique for statistical classifier design , 1996, IEEE Trans. Signal Process..