Consistency and Localizability

We show that all consistent learning methods---that is, that asymptotically achieve the lowest possible expected loss for any distribution on (X,Y)---are necessarily localizable, by which we mean that they do not significantly change their response at a particular point when we show them only the part of the training set that is close to that point. This is true in particular for methods that appear to be defined in a non-local manner, such as support vector machines in classification and least-squares estimators in regression. Aside from showing that consistency implies a specific form of localizability, we also show that consistency is logically equivalent to the combination of two properties: (1) a form of localizability, and (2) that the method's global mean (over the entire X distribution) correctly estimates the true mean. Consistency can therefore be seen as comprised of two aspects, one local and one global.

[1]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[2]  Alon Zakai,et al.  How Local Should a Learning Method Be? , 2008, COLT.

[3]  Alexander Gammerman,et al.  Ridge Regression Learning Algorithm in Dual Variables , 1998, ICML.

[4]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[5]  Ingo Steinwart,et al.  Support Vector Machines are Universally Consistent , 2002, J. Complex..

[6]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[7]  Léon Bottou,et al.  Local Algorithms for Pattern Recognition and Dependencies Estimation , 1993, Neural Computation.

[8]  Hongyu Li,et al.  Supervised Learning on Local Tangent Space , 2005, ISNN.

[9]  W. Cleveland,et al.  Smoothing by Local Regression: Principles and Methods , 1996 .

[10]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.

[11]  Peter L. Bartlett,et al.  AdaBoost is Consistent , 2006, J. Mach. Learn. Res..

[12]  Matti Pietikäinen,et al.  Supervised Locally Linear Embedding Algorithm for Pattern Recognition , 2003, IbPRIA.

[13]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[14]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[15]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[18]  L. Devroye On the Almost Everywhere Convergence of Nonparametric Regression Function Estimates , 1981 .

[19]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[20]  L. Devroye,et al.  Distribution-Free Consistency Results in Nonparametric Discrimination and Regression Function Estimation , 1980 .

[21]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[22]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[23]  Tong Zhang Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .