A Finite-Sample Generalization Bound for Semiparametric Regression: Partially Linear Models

In this paper we provide generalization bounds for semiparametric regression with the so-called partially linear models where the regression function is written as the sum of a linear parametric and a nonlinear, nonparametric function, the latter taken from a some set H with finite entropy-integral. The problem is technically challenging because the parametric part is unconstrained and the model is underdetermined, while the response is allowed to be unbounded with subgaussian tails. Under natural regularity conditions, we bound the generalization error as a function of the metric entropy of H and the dimension of the linear model. Our main tool is a ratio-type concentration inequality for increments of empirical processes, based on which we are able to give an exponential tail bound on the size of the parametric component. We also provide a comparison to alternatives of this technique and discuss why and when the unconstrained parametric part in the model may cause a problem in terms of the expected risk. We also explain by means of a specific example why this problem cannot be detected using the results of classical asymptotic analysis often seen in the statistics literature.

[1]  Alex Gittens,et al.  TAIL BOUNDS FOR ALL EIGENVALUES OF A SUM OF RANDOM MATRICES , 2011, 1104.4513.

[2]  K. Do,et al.  Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[3]  S. Geer Estimating a Regression Function , 1990 .

[4]  P. Massart,et al.  Statistical performance of support vector machines , 2008, 0804.0551.

[5]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[6]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[7]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[8]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[9]  Robert A. Lordo,et al.  Nonparametric and Semiparametric Models , 2005, Technometrics.

[10]  Peter L. Bartlett,et al.  The importance of convexity in learning with squared loss , 1998, COLT '96.

[11]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[12]  W. Greblicki,et al.  Nonparametric system identification , 2008 .

[13]  Joseph T. Chang,et al.  Conditioning as disintegration , 1997 .

[14]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[15]  Stephen J. Wright,et al.  Decomposition Algorithms for Training Large-Scale Semiparametric Support Vector Machines , 2009, ECML/PKDD.

[16]  Bernhard Schölkopf,et al.  Semiparametric Support Vector and Linear Programming Machines , 1998, NIPS.

[17]  Felipe Cucker,et al.  Learning Theory: An Approximation Theory Viewpoint: On the bias–variance problem , 2007 .

[18]  Bernhard Schölkopf,et al.  Training Support Vector Machines with Multiple Equality Constraints , 2005, ECML.

[19]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[20]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[21]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[22]  S. Mahadevan,et al.  Learning Theory , 2001 .

[23]  Yiming Ying,et al.  Support Vector Machine Soft Margin Classifiers: Error Analysis , 2004, J. Mach. Learn. Res..

[24]  Clive W. J. Granger,et al.  Semiparametric estimates of the relation between weather and electricity sales , 1986 .

[25]  R. Dudley A course on empirical processes , 1984 .

[26]  Ingo Steinwart,et al.  Consistency of support vector machines and other regularized kernel classifiers , 2005, IEEE Transactions on Information Theory.

[27]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[28]  J. Stock Nonparametric Policy Analysis , 1989 .

[29]  J. Horowitz Semiparametric and Nonparametric Methods in Econometrics , 2007 .

[30]  D. Pollard Uniform ratio limit theorems for empirical processes , 1995 .

[31]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[32]  P. Bickel Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[33]  D. Pollard A central limit theorem for empirical processes , 1982, Journal of the Australian Mathematical Society. Series A. Pure Mathematics and Statistics.