论文信息 - Universal consistency and rates of convergence of multiclass prototype algorithms in metric spaces

Universal consistency and rates of convergence of multiclass prototype algorithms in metric spaces

We study universal consistency and convergence rates of simple nearest-neighbor prototype rules for the problem of multiclass classification in metric paces. We first show that a novel data-dependent partitioning rule, named Proto-NN, is universally consistent in any metric space that admits a universally consistent rule. Proto-NN is a significant simplification of OptiNet, a recently proposed compression-based algorithm that, to date, was the only algorithm known to be universally consistent in such a general setting. Practically, Proto-NN is simpler to implement and enjoys reduced computational complexity. We then proceed to study convergence rates of the excess error probability. We first obtain rates for the standard $k$-NN rule under a margin condition and a new generalized-Lipschitz condition. The latter is an extension of a recently proposed modified-Lipschitz condition from $\mathbb R^d$ to metric spaces. Similarly to the modified-Lipschitz condition, the new condition avoids any boundness assumptions on the data distribution. While obtaining rates for Proto-NN is left open, we show that a second prototype rule that hybridizes between $k$-NN and Proto-NN achieves the same rates as $k$-NN while enjoying similar computational advantages as Proto-NN. We conjecture however that, as $k$-NN, this hybrid rule is not consistent in general.

László Györfi | Roi Weiss | L. Györfi | Roi Weiss

[1] A. Tsybakov,et al. Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[2] László Györfi,et al. On the asymptotic properties of a nonparametric L/sub 1/-test statistic of homogeneity , 2005, IEEE Transactions on Information Theory.

[3] André Mas. Lower bound in regression for functional data by representation of small ball probabilities , 2012 .

[4] Philippe Vieu,et al. Nonparametric modelling for functional data: selected survey and tracks for future , 2018, Statistics.

[5] J. Bardet. Tests d'autosimilarite des processus gaussiens. Dimension fractale et dimension de correlation , 1997 .

[6] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .

[7] Sanjeev R. Kulkarni,et al. Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[8] Luigi Ambrosio,et al. Lectures on analysis in metric spaces , 2013 .

[9] Luc Devroye,et al. Lectures on the Nearest Neighbor Method , 2015 .

[10] Peter E. Hart,et al. Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[11] Y. Pesin. On rigorous mathematical definitions of correlation dimension and generalized spectrum for dimensions , 1993 .

[12] David Preiss. Invalid Vitali theorems , 1979 .

[13] Lirong Xue,et al. Achieving the time of 1-NN, but the accuracy of k-NN , 2017, AISTATS.

[14] Nadia L. Kudraszow,et al. Uniform consistency of kNN regressors for functional variables , 2013 .

[15] Ricardo Fraiman,et al. Classification methods for functional data , 2018, Oxford Handbooks Online.

[16] Arnaud Guyader,et al. Nearest neighbor classification in infinite dimension , 2006 .

[17] P. Vieu,et al. k-Nearest Neighbour method in functional nonparametric regression , 2009 .

[18] Ingo Steinwart,et al. Improved Classification Rates under Refined Margin Conditions , 2016, 1610.09109.

[19] Vladimir G. Pestov,et al. Universal consistency of the k-NN rule in metric spaces and Nagata dimension , 2020, ArXiv.

[20] Arnaud Guyader,et al. Rates of Convergence of the Functional $k$-Nearest Neighbor Estimate , 2010, IEEE Transactions on Information Theory.

[21] L. Gyorfi. The rate of convergence of k_n -NN regression estimates and classification rules (Corresp.) , 1981 .

[22] Aryeh Kontorovich,et al. A Bayes consistent 1-NN classifier , 2014, AISTATS.

[23] P. Vieu,et al. Rate of uniform consistency for nonparametric estimates with functional variables , 2010 .

[24] Ricardo Fraiman,et al. Consistent Nonparametric Regression for Functional Data Under the Stone–Besicovitch Conditions , 2012, IEEE Transactions on Information Theory.

[25] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[26] Vladimir Spokoiny,et al. An adaptive multiclass nearest neighbor classifier , 2018 .

[27] E. Mammen,et al. Smooth Discrimination Analysis , 1999 .

[28] Aryeh Kontorovich,et al. Universal Bayes Consistency in Metric Spaces , 2019, 2020 Information Theory and Applications Workshop (ITA).

[29] Adam Krzyzak,et al. On the Rate of Convergence of Local Averaging Plug-In Classification Rules Under a Margin Condition , 2007, IEEE Transactions on Information Theory.

[30] Aryeh Kontorovich,et al. Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions , 2017, NIPS.

[31] Jaroslav Tišer,et al. Vitali covering theorem in Hilbert space , 2003 .

[32] Z. Q. John Lu,et al. Nonparametric Functional Data Analysis: Theory And Practice , 2007, Technometrics.

[33] Sébastien Gadat,et al. Classification in general finite dimensional spaces with the k-nearest neighbor rule , 2016 .

[34] L. Zhao. Exponential bounds of mean error for the nearest neighbor estimates of regression functions*1 , 1987 .

[35] Sanjoy Dasgupta,et al. Rates of Convergence for Nearest Neighbor Classification , 2014, NIPS.

[36] Séverine Rigot. Differentiation of Measures in Metric Spaces , 2018, Lecture Notes in Mathematics.

[37] Florentina Bunea,et al. Functional classification in Hilbert spaces , 2005, IEEE Transactions on Information Theory.

[38] C. J. Stone,et al. Consistent Nonparametric Regression , 1977 .

[39] John B. Anderson. Simulated error performance of multi-h phase codes , 1981, IEEE Trans. Inf. Theory.

[40] Gérard Biau,et al. On the Kernel Rule for Function Classification , 2006 .

[41] László Györfi,et al. Rate of Convergence of $k$-Nearest-Neighbor Classification Rule , 2017, J. Mach. Learn. Res..

[42] R. Samworth. Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.

[43] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.