Early stopping for non-parametric regression: An optimal data-dependent stopping rule
暂无分享,去创建一个
Martin J. Wainwright | Bin Yu | Garvesh Raskutti | Bin Yu | M. Wainwright | G. Raskutti | Garvesh Raskutti
[1] J. Mercer. Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .
[2] N. Aronszajn. Theory of Reproducing Kernels. , 1950 .
[3] M. Birman,et al. PIECEWISE-POLYNOMIAL APPROXIMATIONS OF FUNCTIONS OF THE CLASSES $ W_{p}^{\alpha}$ , 1967 .
[4] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .
[5] F. T. Wright. A Bound on Tail Probabilities for Quadratic Forms in Independent Random Variables Whose Distributions are not Necessarily Symmetric , 1973 .
[6] O. Strand. Theory and methods related to the singular-function expansion and Landweber's iteration for integral equations of the first kind , 1974 .
[7] Luc Devroye,et al. Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.
[8] C. Stein. Estimation of the Mean of a Multivariate Normal Distribution , 1981 .
[9] P. M. Prenter,et al. A formal comparison of methods proposed for the numerical solution of first kind integral equations , 1981, The Journal of the Australian Mathematical Society. Series B. Applied Mathematics.
[10] H. Weinert. Reproducing kernel Hilbert spaces: Applications in statistical signal processing , 1982 .
[11] C. J. Stone,et al. Additive Regression and Other Nonparametric Models , 1985 .
[12] Grace Wahba,et al. THREE TOPICS IN ILL-POSED PROBLEMS , 1987 .
[13] Saburou Saitoh,et al. Theory of Reproducing Kernels and Its Applications , 1988 .
[14] Hervé Bourlard,et al. Generalization and Parameter Estimation in Feedforward Netws: Some Experiments , 1989, NIPS.
[15] J. Marron,et al. On variance estimation in nonparametric regression , 1990 .
[16] G. Wahba. Spline models for observational data , 1990 .
[17] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[18] Alexander J. Smola,et al. Learning with kernels , 1998 .
[19] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent , 1999, NIPS.
[20] Yuhong Yang,et al. Information-theoretic determination of minimax rates of convergence , 1999 .
[21] V. Buldygin,et al. Metric characterization of random variables and random processes , 2000 .
[22] S. Geer. Empirical Processes in M-Estimation , 2000 .
[23] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[24] P. Bühlmann,et al. Boosting with the L2-loss: regression and classification , 2001 .
[25] M. Ledoux. The concentration of measure phenomenon , 2001 .
[26] B. Yu,et al. Boosting with the L 2-loss regression and classification , 2001 .
[27] S. R. Jammalamadaka,et al. Empirical Processes in M-Estimation , 2001 .
[28] Shahar Mendelson,et al. Geometric Parameters of Kernel Machines , 2002, COLT.
[29] Chong Gu. Smoothing Spline Anova Models , 2002 .
[30] P. Bühlmann,et al. Boosting With the L2 Loss , 2003 .
[31] Wenxin Jiang. Process consistency for AdaBoost , 2003 .
[32] Chong Gu. Model diagnostics for smoothing spline ANOVA models , 2004 .
[33] Bogdan E. Popescu,et al. Gradient Directed Regularization , 2004 .
[34] Philip D. Plowright,et al. Convexity , 2019, Optimization for Chemical and Biochemical Engineering.
[35] Petros Drineas,et al. On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning , 2005, J. Mach. Learn. Res..
[36] Bin Yu,et al. Boosting with early stopping: Convergence and consistency , 2005, math/0508276.
[37] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[38] Tong Zhang,et al. Learning Bounds for Kernel Regression Using Effective Data Dimensionality , 2005, Neural Computation.
[39] Michael I. Jordan,et al. Convexity, Classification, and Risk Bounds , 2006 .
[40] A. Caponnetto. Optimal Rates for Regularization Operators in Learning Theory , 2006 .
[41] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.
[42] Y. Yao,et al. Adaptation for Regularization Operators in Learning Theory , 2006 .
[43] Peter L. Bartlett,et al. AdaBoost is Consistent , 2006, J. Mach. Learn. Res..
[44] Lorenzo Rosasco,et al. On regularization algorithms in learning theory , 2007, J. Complex..
[45] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[46] A. Barron,et al. Approximation and learning by greedy algorithms , 2008, 0803.1718.
[47] Lorenzo Rosasco,et al. Adaptive Kernel Methods Using the Balancing Principle , 2010, Found. Comput. Math..
[48] Gilles Blanchard,et al. Optimal learning rates for Kernel Conjugate Gradient regression , 2010, NIPS.
[49] Y. Yao,et al. Cross-validation based adaptation for regularization operators in learning theory , 2010 .
[50] Michael W. Mahoney,et al. Implementing regularization implicitly via approximate eigenvector computation , 2010, ICML.
[51] Martin J. Wainwright,et al. Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming , 2010, J. Mach. Learn. Res..