Universal consistency of the k-NN rule in metric spaces and Nagata dimension

The $k$ nearest neighbour learning rule (under the uniform distance tie breaking) is universally consistent in every metric space $X$ that is sigma-finite dimensional in the sense of Nagata. This was pointed out by Cerou and Guyader (2006) as a consequence of the main result by those authors, combined with a theorem in real analysis sketched by D. Preiss (1971) (and elaborated in detail by Assouad and Quentin de Gromard (2006)). We show that it is possible to give a direct proof along the same lines as the original theorem of Charles J. Stone (1977) about the universal consistency of the $k$-NN classifier in the finite dimensional Euclidean space. The generalization is non-trivial because of the distance ties being more prevalent in the non-euclidean setting, and on the way we investigate the relevant geometric properties of the metrics and the limitations of the Stone argument, by constructing various examples.

[1]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[2]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[3]  Open problems left in my wake of research , 2005 .

[4]  J. Nagata On a special metric and dimension , 1964 .

[5]  Patrice Assouad,et al.  Recouvrements, derivation des mesures et dimensions , 2006 .

[6]  J. Nagata Modern Dimension Theory , 1965 .

[7]  Ricardo Fraiman,et al.  Consistent Nonparametric Regression for Functional Data Under the Stone–Besicovitch Conditions , 2012, IEEE Transactions on Information Theory.

[8]  Hubert Haoyang Duan,et al.  Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease , 2014, ArXiv.

[9]  L. Devroye On the Almost Everywhere Convergence of Nonparametric Regression Function Estimates , 1981 .

[10]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[11]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[12]  P. Loeb,et al.  LUSIN'S THEOREM AND BOCHNER INTEGRATION , 2004, math/0406370.

[13]  Arnaud Guyader,et al.  Nearest neighbor classification in infinite dimension , 2006 .

[14]  Stan Hatko,et al.  k-Nearest Neighbour Classification of Datasets with a Family of Distances , 2015, ArXiv.

[15]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[16]  David Preiss Invalid Vitali theorems , 1979 .