Learning bounds via sample width for classifiers on finite metric spaces

In a recent paper [M. Anthony, J. Ratsaby, Maximal width learning of binary functions, Theoretical Computer Science 411 (2010) 138-147] the notion of sample width for binary classifiers mapping from the real line was introduced, and it was shown that the performance of such classifiers could be quantified in terms of this quantity. This paper considers how to generalize the notion of sample width so that we can apply it where the classifiers map from some finite metric space. By relating the learning problem to one involving the domination numbers of certain graphs, we obtain generalization error bounds that depend on the sample width and on certain measures of 'density' of the underlying metric space. We also discuss how to employ a greedy set-covering heuristic to bound generalization error.

[1]  Harry B. Hunt,et al.  Simple heuristics for unit disk graphs , 1995, Networks.

[2]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[3]  F. Tian,et al.  Bounds of Laplacian spectrum of graphs based on the domination number , 2005 .

[4]  Mei Lu,et al.  Lower bounds of the Laplacian spectrum of graphs based on diameter , 2007 .

[5]  László Lovász,et al.  On the ratio of optimal integral and fractional covers , 1975, Discret. Math..

[6]  Lutz Volkmann,et al.  Upper bounds on the domination number of a graph in terms of order, diameter and minimum degree , 2006, Australas. J Comb..

[7]  Charles J. Colbourn,et al.  Unit disk graphs , 1991, Discret. Math..

[8]  Martin Anthony,et al.  Maximal width learning of binary functions , 2010, Theor. Comput. Sci..

[9]  Vasek Chvátal,et al.  A Greedy Heuristic for the Set-Covering Problem , 1979, Math. Oper. Res..

[10]  A. W. van der Vaart,et al.  Uniform Central Limit Theorems , 2001 .

[11]  C. Berge Graphes et hypergraphes , 1970 .

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Dánut Marcu An upper bound on the domination number of a graph. , 1986 .

[14]  Peter L. Bartlett,et al.  Function Learning from Interpolation , 1995, Combinatorics, Probability and Computing.

[15]  Dieter Rautenbach A note on domination, girth and minimum degree , 2008, Discret. Math..

[16]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[17]  D. Pollard Convergence of stochastic processes , 1984 .

[18]  Martin Anthony,et al.  Using boxes and proximity to classify data into several categories , 2012 .

[19]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[20]  Peter L. Bartlett,et al.  Learning in Neural Networks: Theoretical Foundations , 1999 .

[21]  Peter L. Bartlett,et al.  The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network , 1998, IEEE Trans. Inf. Theory.

[22]  Martin Anthony,et al.  Robust cutpoints in the logical analysis of numerical data , 2012, Discret. Appl. Math..

[23]  Martin Anthony,et al.  The performance of a new hybrid classifier based on boxes and nearest neighbors , 2012, ISAIM.

[24]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[25]  P. Bartlett,et al.  Function Learning from Interpolation , 2000, Combinatorics, Probability and Computing.