Sample selection via clustering to construct support vector-like classifiers

This paper explores the possibility of constructing RBF classifiers which, somewhat like support vector machines, use a reduced number of samples as centroids, by means of selecting samples in a direct way. Because sample selection is viewed as a hard computational problem, this selection is done after a previous vector quantization: this way obtaining also other similar machines using centroids selected from those that are learned in a supervised manner. Several forms of designing these machines are considered, in particular with respect to sample selection; as well as some different criteria to train them. Simulation results for well-known classification problems show very good performance of the corresponding designs, improving that of support vector machines and reducing substantially their number of units. This shows that our interest in selecting samples (or centroids) in an efficient manner is justified. Many new research avenues appear from these experiments and discussions, as suggested in our conclusions.

[1]  J J Hopfield,et al.  Learning algorithms and probability distributions in feed-forward and feed-back networks. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[3]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[4]  Bernhard Schölkopf,et al.  Improving the Accuracy and Speed of Support Vector Machines , 1996, NIPS.

[5]  Brian A. Telfer,et al.  Energy functions for minimizing misclassification error with minimum-complexity networks , 1994, Neural Networks.

[6]  Peter L. Bartlett,et al.  For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.

[7]  Trevor Hastie,et al.  Neural Networks and Related Methods for Classification - Discussion , 1994 .

[8]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[9]  Bernhard Schölkopf,et al.  Extracting Support Data for a Given Task , 1995, KDD.

[10]  D. Lowe,et al.  Adaptive radial basis function nonlinearities, and the problem of generalisation , 1989 .

[11]  Christian Cachin,et al.  Pedagogical pattern selection strategies , 1994, Neural Networks.

[12]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[13]  R. Detrano,et al.  International application of a new probability algorithm for the diagnosis of coronary artery disease. , 1989, The American journal of cardiology.

[14]  J. J. Hopfield,et al.  Learning algorithms andprobability distributions infeed-forward andfeed-back networks , 1987 .

[15]  Stanley C. Ahalt,et al.  Competitive learning algorithms for vector quantization , 1990, Neural Networks.

[16]  Geoffrey E. Hinton Connectionist Learning Procedures , 1989, Artif. Intell..

[17]  Jack Sklansky,et al.  Locally Trained Piecewise Linear Classifiers , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[19]  Richard Lippmann,et al.  A Boundary Hunting Radial Basis Function Classifier which Allocates Centers Constructively , 1992, NIPS.

[20]  Paul W. Munro,et al.  Repeat Until Bored: A Pattern Selection Strategy , 1991, NIPS.

[21]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[22]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[23]  Jesús Cid-Sueiro,et al.  Cost functions to estimate a posteriori probabilities in multiclass problems , 1999, IEEE Trans. Neural Networks.

[24]  Federico Girosi,et al.  An improved training algorithm for support vector machines , 1997, Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Signal Processing Society Workshop.

[25]  Federico Girosi,et al.  Reducing the run-time complexity of Support Vector Machines , 1999 .

[26]  James E. Fowler,et al.  Vector Quantization using Artificial Neural Network Models , 1996 .

[27]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[28]  H. Szu,et al.  Implementing the minimum-misclassification-error energy function for target recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Bernhard Schölkopf,et al.  Comparing support vector machines with Gaussian kernels to radial basis function classifiers , 1997, IEEE Trans. Signal Process..