Functional Classification with Margin Conditions

Let (X,Y) be a $\mathcal{X}$× 0,1 valued random pair and consider a sample (X1,Y1),...,(Xn,Yn) drawn from the distribution of (X,Y). We aim at constructing from this sample a classifier that is a function which would predict the value of Y from the observation of X. The special case where $\mathcal{X}$is a functional space is of particular interest due to the so called curse of dimensionality. In a recent paper, Biau et al. [1] propose to filter the Xi’s in the Fourier basis and to apply the classical k–Nearest Neighbor rule to the first d coefficients of the expansion. The selection of both k and d is made automatically via a penalized criterion. We extend this study, and note here the penalty used by Biau et al. is too heavy when we consider the minimax point of view under some margin type assumptions. We prove that using a penalty of smaller order or equal to zero is preferable both in theory and practice. Our experimental study furthermore shows that the introduction of a small-order penalty stabilizes the selection process, while preserving rather good performances.

[1]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[2]  A. Tsybakov,et al.  Fast learning rates for plug-in classifiers , 2005, 0708.2321.

[3]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[4]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[5]  Peter Hall,et al.  A Functional Data—Analytic Approach to Signal Discrimination , 2001, Technometrics.

[6]  Yuhong Yang Can the Strengths of AIC and BIC Be Shared , 2005 .

[7]  L. Devroye,et al.  An equivalence theorem for L1 convergence of the kernel regression estimate , 1989 .

[8]  Sanjeev R. Kulkarni,et al.  Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[9]  Henry W. Altland,et al.  Applied Functional Data Analysis , 2003, Technometrics.

[10]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[11]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Christine Tuleau SELECTION DE VARIABLES POUR LA DISCRIMINATION EN GRANDE DIMENSION ET CLASSIFICATION DE DONNEES FONCTIONNELLES , 2005 .

[13]  中澤 真,et al.  Devroye, L., Gyorfi, L. and Lugosi, G. : A Probabilistic Theory of Pattern Recognition, Springer (1996). , 1997 .

[14]  Florentina Bunea,et al.  Functional classification in Hilbert spaces , 2005, IEEE Transactions on Information Theory.

[15]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[16]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[17]  F. Rossi,et al.  Classification in Hilbert Spaces with Support Vector Machines , 2005 .

[18]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[19]  Nicolas W. Hengartner,et al.  Bandwidth selection for local linear regression smoothers , 2002 .

[20]  Luc Devroye,et al.  Lower bounds in pattern recognition and learning , 1995, Pattern Recognit..

[21]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[22]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[23]  L. Györfi Principles of nonparametric learning , 2002 .

[24]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[25]  L. Rouviere Functional Learning with Wavelets , 2005 .

[26]  P. Massart,et al.  Minimum contrast estimators on sieves: exponential bounds and rates of convergence , 1998 .

[27]  Gérard Biau,et al.  On the Kernel Rule for Function Classification , 2006 .

[28]  S. Boucheron,et al.  Theory of classification : a survey of some recent advances , 2005 .

[29]  A. Davison,et al.  Report of the Editors—2001 , 2002 .

[30]  L. Devroye Nonparametric Discrimination and Density Estimation. , 1976 .

[31]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[32]  Frédéric Ferraty,et al.  Curves discrimination: a nonparametric functional approach , 2003, Comput. Stat. Data Anal..

[33]  Arnaud Guyader,et al.  Nearest neighbor classification in infinite dimension , 2006 .

[34]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[35]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[36]  J. O. Ramsay,et al.  Functional Data Analysis (Springer Series in Statistics) , 1997 .

[37]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.