Voronoi networks and their probability of misclassification

Nearest neighbor classifiers that use all the training samples for classification require large memory and demand large online testing computation. To reduce the memory requirements and the computation cost, many algorithms have been developed that perform nearest neighbor classification using only a small number of representative samples obtained from the training set. We call the classification model underlying all these algorithms as Voronoi networks (Vnets), because these algorithms discretize the feature space into Voronoi regions and assign the samples in each region to a class. In this paper we analyze the generalization capabilities of these networks by bounding the generalization error. The class of problems that can be "efficiently" solved by Vnets is characterized by the extent to which the set of points on the decision boundaries fill the feature space, thus quantifying how efficiently a problem can be solved using Vnets. We show that Vnets asymptotically converge to the Bayes classifier with arbitrarily high probability provided the number of representative samples grow slower than the square root of the number of training samples and also give the optimal growth rate of the number of representative samples. We redo the analysis for decision tree (DT) classifiers and compare them with Vnets. The bias/variance dilemma and the curse of dimensionality with respect to Vnets and DTs are also discussed.

[1]  G. Lorentz Approximation of Functions , 1966 .

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[5]  Richard M. Dudley,et al.  Some special vapnik-chervonenkis classes , 1981, Discret. Math..

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[8]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[9]  David Haussler,et al.  Learning decision trees from random examples , 1988, COLT '88.

[10]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[11]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[12]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[13]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[14]  J. Ross Quinlan,et al.  Decision trees and decision-making , 1990, IEEE Trans. Syst. Man Cybern..

[15]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[16]  G. A. Edgar Measure, Topology, and Fractal Geometry , 1990 .

[17]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[18]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[19]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[20]  Balas K. Natarajan,et al.  Machine Learning: A Theoretical Approach , 1992 .

[21]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[22]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[23]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[24]  James C. Bezdek,et al.  Generalized clustering networks and Kohonen's self-organizing scheme , 1993, IEEE Trans. Neural Networks.

[25]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[26]  Philip M. Long,et al.  Characterizations of Learnability for Classes of {0, ..., n}-Valued Functions , 1995, J. Comput. Syst. Sci..

[27]  Federico Girosi,et al.  On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[28]  Mathukumalli Vidyasagar,et al.  A Theory of Learning and Generalization: With Applications to Neural Networks and Control Systems , 1997 .

[29]  Christine Decaestecker,et al.  Finding prototypes for nearest neighbour classification by means of gradient descent and deterministic annealing , 1997, Pattern Recognit..

[30]  Qiangfu Zhao Stable online evolutionary learning of NN-MLP , 1997, IEEE Trans. Neural Networks.

[31]  James C. Bezdek,et al.  Multiple-prototype classifier design , 1998, IEEE Trans. Syst. Man Cybern. Part C.