k-NN Regression on Functional Data with Incomplete Observations

In this paper we study a general version of regression where each covariate itself is a functional data such as distributions or functions. In real applications, however, typically we do not have direct access to such data; instead only some noisy estimates of the true covariate functions/distributions are available to us. For example, when each covariate is a distribution, then we might not be able to directly observe these distributions, but it can be assumed that i.i.d. sample sets from these distributions are available. In this paper we present a general framework and a k-NN based estimator for this regression problem. We prove consistency of the estimator and derive its convergence rates. We further show that the proposed estimator can adapt to the local intrinsic dimension in our case and provide a simple approach for choosing k. Finally, we illustrate the applicability of our framework with numerical experiments.

[1]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[2]  Jianqing Fan,et al.  Nonparametric regression with errors in variables , 1993 .

[3]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[4]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[5]  Andrew W. Moore,et al.  'N-Body' Problems in Statistical Learning , 2000, NIPS.

[6]  Luc Devroye,et al.  Combinatorial methods in density estimation , 2001, Springer series in statistics.

[7]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[8]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[9]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[10]  Frédéric Ferraty,et al.  Nonparametric Functional Data Analysis: Theory and Practice (Springer Series in Statistics) , 2006 .

[11]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[12]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[13]  Z. Q. John Lu,et al.  Nonparametric Functional Data Analysis: Theory And Practice , 2007, Technometrics.

[14]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[15]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[16]  Samory Kpotufe,et al.  k-NN Regression Adapts to Local Intrinsic Dimension , 2011, NIPS.

[17]  Guy Lever,et al.  Conditional mean embeddings as regressors , 2012, ICML.

[18]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[19]  J. Romo,et al.  Lasso variable selection in functional regression , 2013 .

[20]  Barnabás Póczos,et al.  Distribution-Free Distribution Regression , 2013, AISTATS.

[21]  Barnabás Póczos,et al.  Distribution to Distribution Regression , 2013, ICML.

[22]  Barnabás Póczos,et al.  Scale Invariant Conditional Dependence Measures , 2013, ICML.

[23]  Barnabás Póczos,et al.  FuSSO: Functional Shrinkage and Selection Operator , 2013, AISTATS.