Universal consistency and rates of convergence of multiclass prototype algorithms in metric spaces

We study universal consistency and convergence rates of simple nearest-neighbor prototype rules for the problem of multiclass classification in metric paces. We first show that a novel data-dependent partitioning rule, named Proto-NN, is universally consistent in any metric space that admits a universally consistent rule. Proto-NN is a significant simplification of OptiNet, a recently proposed compression-based algorithm that, to date, was the only algorithm known to be universally consistent in such a general setting. Practically, Proto-NN is simpler to implement and enjoys reduced computational complexity. We then proceed to study convergence rates of the excess error probability. We first obtain rates for the standard $k$-NN rule under a margin condition and a new generalized-Lipschitz condition. The latter is an extension of a recently proposed modified-Lipschitz condition from $\mathbb R^d$ to metric spaces. Similarly to the modified-Lipschitz condition, the new condition avoids any boundness assumptions on the data distribution. While obtaining rates for Proto-NN is left open, we show that a second prototype rule that hybridizes between $k$-NN and Proto-NN achieves the same rates as $k$-NN while enjoying similar computational advantages as Proto-NN. We conjecture however that, as $k$-NN, this hybrid rule is not consistent in general.

[1]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2007, 0708.2321.

[2]  László Györfi,et al.  On the asymptotic properties of a nonparametric L/sub 1/-test statistic of homogeneity , 2005, IEEE Transactions on Information Theory.

[3]  André Mas Lower bound in regression for functional data by representation of small ball probabilities , 2012 .

[4]  Philippe Vieu,et al.  Nonparametric modelling for functional data: selected survey and tracks for future , 2018, Statistics.

[5]  J. Bardet Tests d'autosimilarite des processus gaussiens. Dimension fractale et dimension de correlation , 1997 .

[6]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[7]  Sanjeev R. Kulkarni,et al.  Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[8]  Luigi Ambrosio,et al.  Lectures on analysis in metric spaces , 2013 .

[9]  Luc Devroye,et al.  Lectures on the Nearest Neighbor Method , 2015 .

[10]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[11]  Y. Pesin On rigorous mathematical definitions of correlation dimension and generalized spectrum for dimensions , 1993 .

[12]  David Preiss Invalid Vitali theorems , 1979 .

[13]  Lirong Xue,et al.  Achieving the time of 1-NN, but the accuracy of k-NN , 2017, AISTATS.

[14]  Nadia L. Kudraszow,et al.  Uniform consistency of kNN regressors for functional variables , 2013 .

[15]  Ricardo Fraiman,et al.  Classification methods for functional data , 2018, Oxford Handbooks Online.

[16]  Arnaud Guyader,et al.  Nearest neighbor classification in infinite dimension , 2006 .

[17]  P. Vieu,et al.  k-Nearest Neighbour method in functional nonparametric regression , 2009 .

[18]  Ingo Steinwart,et al.  Improved Classification Rates under Refined Margin Conditions , 2016, 1610.09109.

[19]  Vladimir G. Pestov,et al.  Universal consistency of the k-NN rule in metric spaces and Nagata dimension , 2020, ArXiv.

[20]  Arnaud Guyader,et al.  Rates of Convergence of the Functional $k$-Nearest Neighbor Estimate , 2010, IEEE Transactions on Information Theory.

[21]  L. Gyorfi The rate of convergence of k_n -NN regression estimates and classification rules (Corresp.) , 1981 .

[22]  Aryeh Kontorovich,et al.  A Bayes consistent 1-NN classifier , 2014, AISTATS.

[23]  P. Vieu,et al.  Rate of uniform consistency for nonparametric estimates with functional variables , 2010 .

[24]  Ricardo Fraiman,et al.  Consistent Nonparametric Regression for Functional Data Under the Stone–Besicovitch Conditions , 2012, IEEE Transactions on Information Theory.

[25]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[26]  Vladimir Spokoiny,et al.  An adaptive multiclass nearest neighbor classifier , 2018 .

[27]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[28]  Aryeh Kontorovich,et al.  Universal Bayes Consistency in Metric Spaces , 2019, 2020 Information Theory and Applications Workshop (ITA).

[29]  Adam Krzyzak,et al.  On the Rate of Convergence of Local Averaging Plug-In Classification Rules Under a Margin Condition , 2007, IEEE Transactions on Information Theory.

[30]  Aryeh Kontorovich,et al.  Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions , 2017, NIPS.

[31]  Jaroslav Tišer,et al.  Vitali covering theorem in Hilbert space , 2003 .

[32]  Z. Q. John Lu,et al.  Nonparametric Functional Data Analysis: Theory And Practice , 2007, Technometrics.

[33]  Sébastien Gadat,et al.  Classification in general finite dimensional spaces with the k-nearest neighbor rule , 2016 .

[34]  L. Zhao Exponential bounds of mean error for the nearest neighbor estimates of regression functions*1 , 1987 .

[35]  Sanjoy Dasgupta,et al.  Rates of Convergence for Nearest Neighbor Classification , 2014, NIPS.

[36]  Séverine Rigot Differentiation of Measures in Metric Spaces , 2018, Lecture Notes in Mathematics.

[37]  Florentina Bunea,et al.  Functional classification in Hilbert spaces , 2005, IEEE Transactions on Information Theory.

[38]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[39]  John B. Anderson Simulated error performance of multi-h phase codes , 1981, IEEE Trans. Inf. Theory.

[40]  Gérard Biau,et al.  On the Kernel Rule for Function Classification , 2006 .

[41]  László Györfi,et al.  Rate of Convergence of $k$-Nearest-Neighbor Classification Rule , 2017, J. Mach. Learn. Res..

[42]  R. Samworth Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.

[43]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.