Contributions à l'apprentissage statistique : estimation de densité, agrégation d'experts et forêts aléatoires. (Contributions to statistical learning : density estimation, expert aggregation and random forests)
暂无分享,去创建一个
[1] A. Tsybakov,et al. Aggregation for Gaussian regression , 2007, 0710.3654.
[2] Francesco Orabona,et al. Improved Strongly Adaptive Online Learning using Coin Betting , 2016, AISTATS.
[3] Anja Vogler,et al. An Introduction to Multivariate Statistical Analysis , 2004 .
[4] Tin Kam Ho,et al. The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..
[5] Horst Bischof,et al. On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.
[6] Geoff Hulten,et al. Mining high-speed data streams , 2000, KDD '00.
[7] Wouter M. Koolen,et al. Combining Adversarial Guarantees and Stochastic Fast Rates in Online Learning , 2016, NIPS.
[8] Robert B. Gramacy,et al. Dynamic Trees for Learning and Design , 2009, 0912.1586.
[9] Joel A. Tropp,et al. An Introduction to Matrix Concentration Inequalities , 2015, Found. Trends Mach. Learn..
[10] Michael W. Mahoney,et al. A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares , 2014, J. Mach. Learn. Res..
[11] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[12] M. Rudelson,et al. The Littlewood-Offord problem and invertibility of random matrices , 2007, math/0703503.
[13] Massimiliano Pontil,et al. Online Gradient Descent Learning Algorithms , 2008, Found. Comput. Math..
[14] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[15] Paul E. Utgoff,et al. Incremental Induction of Decision Trees , 1989, Machine Learning.
[16] Wouter M. Koolen,et al. Universal Codes From Switching Strategies , 2013, IEEE Transactions on Information Theory.
[17] T. Poggio,et al. STABILITY RESULTS IN LEARNING THEORY , 2005 .
[18] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.
[19] Wouter M. Koolen,et al. Putting Bayes to sleep , 2012, NIPS.
[20] Mark D. Reid,et al. Fast rates in statistical and online learning , 2015, J. Mach. Learn. Res..
[21] O. Catoni. The Mixture Approach to Universal Model Selection , 1997 .
[22] Stefan Wager,et al. High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.
[23] Luc Devroye,et al. Lower bounds in pattern recognition and learning , 1995, Pattern Recognit..
[24] Yali Amit,et al. Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.
[25] J. Hartigan. The maximum likelihood prior , 1998 .
[26] P. Massart,et al. Rates of convergence for minimum contrast estimators , 1993 .
[27] T. Tao,et al. From the Littlewood-Offord problem to the Circular Law: universality of the spectral distribution of random matrices , 2008, 0810.2994.
[28] Jason M. Klusowski. Complete Analysis of a Random Forest Model , 2018, ArXiv.
[29] Arnaud Guyader,et al. On the Rate of Convergence of the Bagged Nearest Neighbor Estimate , 2010, J. Mach. Learn. Res..
[30] Alessandro Lazaric,et al. Exploiting easy data in online optimization , 2014, NIPS.
[31] F. Komaki. On asymptotic properties of predictive distributions , 1996 .
[32] J. W. Silverstein,et al. Spectral Analysis of Large Dimensional Random Matrices , 2009 .
[33] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..
[34] Boris Ryabko,et al. Prediction of random sequences and universal coding , 2015 .
[35] Shai Shalev-Shwartz,et al. Average Stability is Invariant to Data Preconditioning. Implications to Exp-concave Empirical Risk Minimization , 2016, J. Mach. Learn. Res..
[36] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..
[37] Vladimir Vovk,et al. Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.
[38] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.
[39] Lorenzo Rosasco,et al. Model Selection for Regularized Least-Squares Algorithm in Learning Theory , 2005, Found. Comput. Math..
[40] C. J. Stone,et al. Optimal Rates of Convergence for Nonparametric Estimators , 1980 .
[41] P. Massart,et al. Concentration inequalities and model selection , 2007 .
[42] Gilles Stoltz,et al. Fano's inequality for random variables , 2017, Statistical Science.
[43] Eric R. Ziegel,et al. Generalized Linear Models , 2002, Technometrics.
[44] H. Robbins. A Stochastic Approximation Method , 1951 .
[45] R. Keener. Theoretical Statistics: Topics for a Core Course , 2010 .
[46] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.
[47] Mark W. Schmidt,et al. A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method , 2012, ArXiv.
[48] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.
[49] Feng Liang,et al. Exact minimax strategies for predictive density estimation, data compression, and model selection , 2002, IEEE Transactions on Information Theory.
[50] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.
[51] M. Talagrand. New concentration inequalities in product spaces , 1996 .
[52] Dana Ron,et al. Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation , 1997, Neural Computation.
[53] László Györfi,et al. A simple randomized algorithm for sequential prediction of ergodic time series , 1999, IEEE Trans. Inf. Theory.
[54] Wouter M. Koolen,et al. Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..
[55] A Tikhonov,et al. Solution of Incorrectly Formulated Problems and the Regularization Method , 1963 .
[56] Dmitrii Ostrovskii,et al. Finite-sample Analysis of M-estimators using Self-concordance , 2018, 1810.06838.
[57] R. Dudley. The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .
[58] Claudio Gentile,et al. Regret Minimization for Branching Experts , 2022 .
[59] R. Vershynin,et al. Covariance estimation for distributions with 2+ε moments , 2011, 1106.2775.
[60] Daniel M. Roy,et al. Bayesian Models of Graphs, Arrays and Other Exchangeable Random Structures , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[61] Ambuj Tewari,et al. On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.
[62] Y. Yin. Limiting spectral distribution for a class of random matrices , 1986 .
[63] Alexander J. Smola,et al. Learning with kernels , 1998 .
[64] Ohad Shamir,et al. The sample complexity of learning linear predictors with the squared loss , 2014, J. Mach. Learn. Res..
[65] M. Rudelson,et al. Small Ball Probabilities for Linear Images of High-Dimensional Distributions , 2014, 1402.4492.
[66] Yun Yang,et al. Bayesian regression tree ensembles that adapt to smoothness and sparsity , 2017, Journal of the Royal Statistical Society: Series B (Statistical Methodology).
[67] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[68] M. Ledoux. The concentration of measure phenomenon , 2001 .
[69] J. A. Díaz-García,et al. SENSITIVITY ANALYSIS IN LINEAR REGRESSION , 2022 .
[70] Robert E. Schapire,et al. Predicting Nearly As Well As the Best Pruning of a Decision Tree , 1995, COLT '95.
[71] Roman Vershynin,et al. Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.
[72] Tong Zhang. From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation , 2006, math/0702653.
[73] E. Candès,et al. The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression , 2018, The Annals of Statistics.
[74] Felipe Cucker,et al. On the mathematical foundations of learning , 2001 .
[75] Donna L. Mohr,et al. Multiple Regression , 2002, Encyclopedia of Autism Spectrum Disorders.
[76] Gilles Louppe,et al. Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.
[77] Seshadhri Comandur,et al. Efficient learning algorithms for changing environments , 2009, ICML '09.
[78] Sylvain Arlot. TECHNICAL APPENDIX TO "V -FOLD CROSS-VALIDATION IMPROVED: V -FOLD PENALIZATION , 2008, 0802.0566.
[79] Neri Merhav,et al. Universal Prediction , 1998, IEEE Trans. Inf. Theory.
[80] Z. Bai,et al. Limit of the smallest eigenvalue of a large dimensional sample covariance matrix , 1993 .
[81] Benjamin Recht,et al. Random Features for Large-Scale Kernel Machines , 2007, NIPS.
[82] L. Devroye. Necessary and sufficient conditions for the pointwise convergence of nearest neighbor regression function estimates , 1982 .
[83] David A. McAllester. PAC-Bayesian model averaging , 1999, COLT '99.
[84] S. Athey,et al. Generalized random forests , 2016, The Annals of Statistics.
[85] Luc Devroye,et al. Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.
[86] Ambuj Tewari,et al. Smoothness, Low Noise and Fast Rates , 2010, NIPS.
[87] Antonio Criminisi,et al. Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..
[88] Vladimir Vovk,et al. Prediction with Expert Evaluators' Advice , 2009, ALT.
[89] Jayanta K. Ghosh,et al. Higher Order Asymptotics , 1994 .
[90] W. Wong,et al. Probability inequalities for likelihood ratios and convergence rates of sieve MLEs , 1995 .
[91] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.
[92] Erwan Scornet,et al. Random Forests and Kernel Methods , 2015, IEEE Transactions on Information Theory.
[93] P. Yaskov. Sharp lower bounds on the least singular value of a random matrix without the fourth moment condition , 2015 .
[94] Cosma Rohilla Shalizi,et al. Adapting to Non-stationarity with Growing Expert Ensembles , 2011, ArXiv.
[95] Kfir Y. Levy,et al. Fast Rates for Exp-concave Empirical Risk Minimization , 2015, NIPS.
[96] Tor Lattimore,et al. Following the Leader and Fast Rates in Online Linear Prediction: Curved Constraint Sets and Other Regularities , 2017, J. Mach. Learn. Res..
[97] M. Rudelson,et al. The smallest singular value of a random rectangular matrix , 2008, 0802.3956.
[98] D. Ruppert,et al. Efficient Estimations from a Slowly Convergent Robbins-Monro Process , 1988 .
[99] Andrew R. Barron,et al. Asymptotic minimax regret for data compression, gambling, and prediction , 1997, IEEE Trans. Inf. Theory.
[100] Nick Littlestone,et al. From on-line to batch learning , 1989, COLT '89.
[101] Adele Cutler,et al. PERT – Perfect Random Tree Ensembles , 2001 .
[102] Soumendu Sundar Mukherjee,et al. Weak convergence and empirical processes , 2019 .
[103] Eric Moulines,et al. Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.
[104] C. J. Stone,et al. Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .
[105] Tong Zhang,et al. Information-theoretic upper and lower bounds for statistical estimation , 2006, IEEE Transactions on Information Theory.
[106] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[107] Leo Breiman,et al. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .
[108] Alexander Shapiro,et al. Stochastic Approximation approach to Stochastic Programming , 2013 .
[109] P. J. Huber. Robust Regression: Asymptotics, Conjectures and Monte Carlo , 1973 .
[110] Thomas G. Dietterich. An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.
[111] Yuhong Yang,et al. Minimax Nonparametric Classification—Part I: Rates of Convergence , 1998 .
[112] N. Merhav,et al. Low complexity sequential lossless coding for piecewise stationary memoryless sources , 1998, Proceedings. 1998 IEEE International Symposium on Information Theory (Cat. No.98CH36252).
[113] Noureddine El Karoui,et al. Geometric sensitivity of random matrix results: consequences for shrinkage estimators of covariance and related statistical methods , 2011, 1105.1404.
[114] Peter L. Bartlett,et al. Exchangeability Characterizes Optimality of Sequential Normalized Maximum Likelihood and Bayesian Prediction , 2012, IEEE Transactions on Information Theory.
[115] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .
[116] Nathan Srebro,et al. Fast Rates for Regularized Objectives , 2008, NIPS.
[117] T. Tao,et al. Inverse Littlewood-Offord theorems and the condition number of random discrete matrices , 2005, math/0511215.
[118] Jean-Yves Audibert,et al. Linear regression through PAC-Bayesian truncation , 2010, 1010.0072.
[119] Vladimir Koltchinskii,et al. Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.
[120] Haipeng Luo,et al. Achieving All with No Parameters: AdaNormalHedge , 2015, COLT.
[121] Rong Jin,et al. Lower and Upper Bounds on the Generalization of Stochastic Exponentially Concave Optimization , 2015, COLT.
[122] Adrien-Marie Legendre,et al. Nouvelles méthodes pour la détermination des orbites des comètes , 1970 .
[123] Stéphan Clémençon,et al. Ranking forests , 2013, J. Mach. Learn. Res..
[124] Misha Denil,et al. Consistency of Online Random Forests , 2013, ICML.
[125] Yee Whye Teh,et al. The Mondrian Process , 2008, NIPS.
[126] R. Welsch,et al. The Hat Matrix in Regression and ANOVA , 1978 .
[127] Shahar Mendelson,et al. Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey , 2019, Found. Comput. Math..
[128] Ronald L. Rivest,et al. Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..
[129] R. Samworth. Optimal weighted nearest neighbour classifiers , 2011, 1101.5783.
[130] Yuhong Yang,et al. Information-theoretic determination of minimax rates of convergence , 1999 .
[131] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[132] Daniel J. Hsu,et al. Loss Minimization and Parameter Estimation with Heavy Tails , 2013, J. Mach. Learn. Res..
[133] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[134] Aleksandrs Slivkins,et al. One Practical Algorithm for Both Stochastic and Adversarial Bandits , 2014, ICML.
[135] K. Wachter. The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements , 1978 .
[136] Anant Sahai,et al. Harmless interpolation of noisy data in regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[137] S. Mendelson,et al. Learning subgaussian classes : Upper and minimax bounds , 2013, 1305.4825.
[138] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .
[139] G. Wahba. Spline models for observational data , 1990 .
[140] Sham M. Kakade,et al. Online Bounds for Bayesian Algorithms , 2004, NIPS.
[141] P. Massart,et al. Risk bounds for statistical learning , 2007, math/0702683.
[142] Yee Whye Teh,et al. Mondrian Forests: Efficient Online Random Forests , 2014, NIPS.
[143] R. Z. Khasʹminskiĭ,et al. Statistical estimation : asymptotic theory , 1981 .
[144] Arkadi Nemirovski,et al. Topics in Non-Parametric Statistics , 2000 .
[145] Francis R. Bach,et al. Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression , 2013, J. Mach. Learn. Res..
[146] Yishay Mansour,et al. Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.
[147] Stéphane Gaïffas,et al. On the optimality of the Hedge algorithm in the stochastic regime , 2018, J. Mach. Learn. Res..
[148] Jean-Yves Audibert,et al. Robust linear least squares regression , 2010, 1010.0074.
[149] Vianney Perchet,et al. ONLINE LEARNING AND GAME THEORY. A QUICK OVERVIEW WITH RECENT RESULTS AND APPLICATIONS , 2015 .
[150] Kazuoki Azuma. WEIGHTED SUMS OF CERTAIN DEPENDENT RANDOM VARIABLES , 1967 .
[151] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[152] X. Fernique. Regularite des trajectoires des fonctions aleatoires gaussiennes , 1975 .
[153] Marina Daecher. Open Problems In Communication And Computation , 2016 .
[154] D. Freedman,et al. How Many Variables Should Be Entered in a Regression Equation , 1983 .
[155] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.
[156] Mark Herbster,et al. Tracking the Best Expert , 1995, Machine Learning.
[157] Andrew M. Saxe,et al. High-dimensional dynamics of generalization error in neural networks , 2017, Neural Networks.
[158] Daniel M. Roy. Computability, inference and modeling in probabilistic programming , 2011 .
[159] Roberto Imbuzeiro Oliveira,et al. The lower tail of random quadratic forms with applications to ordinary least squares , 2013, ArXiv.
[160] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .
[161] Leo Breiman,et al. Random Forests , 2001, Machine Learning.
[162] Luc Devroye,et al. Distribution-free inequalities for the deleted and holdout error estimates , 1979, IEEE Trans. Inf. Theory.
[163] Nicolò Cesa-Bianchi,et al. Mirror Descent Meets Fixed Share (and feels no regret) , 2012, NIPS.
[164] Rory A. Fisher,et al. Theory of Statistical Estimation , 1925, Mathematical Proceedings of the Cambridge Philosophical Society.
[165] Yee Whye Teh,et al. Mondrian Forests for Large-Scale Regression when Uncertainty Matters , 2015, AISTATS.
[166] Peter Auer,et al. An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits , 2016, COLT.
[167] Malay Ghosh,et al. Nonsubjective priors via predictive relative entropy regret , 2006 .
[168] Nicolai Meinshausen,et al. Quantile Regression Forests , 2006, J. Mach. Learn. Res..
[169] Scott McQuade,et al. Global Climate Model Tracking Using Geospatial Neighborhoods , 2012, AAAI.
[170] E. Candès,et al. A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.
[171] Misha Denil,et al. Narrowing the Gap: Random Forests In Theory and In Practice , 2013, ICML.
[172] Noureddine El Karoui,et al. Asymptotic behavior of unregularized and ridge-regularized high-dimensional robust regression estimators : rigorous results , 2013, 1311.2445.
[173] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..
[174] Wouter M. Koolen,et al. Adaptive Hedge , 2011, NIPS.
[175] V. Vu,et al. Small Ball Probability, Inverse Theorems, and Applications , 2012, 1301.0019.
[176] Peter Grünwald,et al. A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity , 2017, ALT.
[177] Abraham Wald,et al. Statistical Decision Functions , 1951 .
[178] Yuhong Yang. Mixing Strategies for Density Estimation , 2000 .
[179] L. Breiman. SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .
[180] Sanjoy Dasgupta,et al. Which Spatial Partition Trees are Adaptive to Intrinsic Dimension? , 2009, UAI.
[181] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[182] Jorma Rissanen,et al. Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.
[183] B. Ripley,et al. Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.
[184] E. Giné,et al. Some Limit Theorems for Empirical Processes , 1984 .
[185] Elad Hazan,et al. Logistic Regression: Tight Bounds for Stochastic and Online Optimization , 2014, COLT.
[186] Manfred K. Warmuth,et al. Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..
[187] M. Talagrand,et al. Probability in Banach Spaces: Isoperimetry and Processes , 1991 .
[188] Francis R. Bach,et al. Self-concordant analysis for logistic regression , 2009, ArXiv.
[189] Gábor Lugosi,et al. An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits , 2017, COLT.
[190] Wojciech Kotlowski,et al. Maximum Likelihood vs. Sequential Normalized Maximum Likelihood in On-line Density Estimation , 2011, COLT.
[191] Yuhong Yang,et al. An Asymptotic Property of Model Selection Criteria , 1998, IEEE Trans. Inf. Theory.
[192] Charles R. Johnson,et al. Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.
[193] François Laviolette,et al. PAC-Bayesian learning of linear classifiers , 2009, ICML '09.
[194] Odalric-Ambrym Maillard,et al. Efficient tracking of a growing number of experts , 2017, ALT.
[195] Karthik Sridharan,et al. Sequential Probability Assignment with Binary Alphabets and Large Classes of Experts , 2015, ArXiv.
[196] J. Hájek. Local asymptotic minimax and admissibility in estimation , 1972 .
[197] E. L. Lehmann,et al. Theory of point estimation , 1950 .
[198] E. Mammen,et al. Smooth Discrimination Analysis , 1999 .
[199] M. Rudelson. Random Vectors in the Isotropic Position , 1996, math/9608208.
[200] Nishant Mehta,et al. Fast rates with high probability in exp-concave statistical learning , 2016, AISTATS.
[201] O. Catoni. PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.
[202] Claire Monteleoni,et al. Tracking climate models , 2011, CIDU.
[203] Antonia Maria Tulino,et al. Random Matrix Theory and Wireless Communications , 2004, Found. Trends Commun. Inf. Theory.
[204] Frans M. J. Willems,et al. The context-tree weighting method: basic properties , 1995, IEEE Trans. Inf. Theory.
[205] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.
[206] A. Caponnetto,et al. Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..
[207] Eric Moulines,et al. Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n) , 2013, NIPS.
[208] Manfred K. Warmuth,et al. The Last-Step Minimax Algorithm , 2000, ALT.
[209] P. Yaskov. Lower bounds on the smallest eigenvalue of a sample covariance matrix. , 2014, 1409.6188.
[210] John Shawe-Taylor,et al. PAC-Bayes & Margins , 2002, NIPS.
[211] Pierre Priouret,et al. Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.
[212] Feng Liang,et al. Improved minimax predictive densities under Kullback-Leibler loss , 2006 .
[213] Nicolas Macris,et al. Optimal errors and phase transitions in high-dimensional generalized linear models , 2017, Proceedings of the National Academy of Sciences.
[214] V. Koltchinskii,et al. Bounding the smallest singular value of a random matrix without concentration , 2013, 1312.3580.
[215] J. Aitchison. Goodness of prediction fit , 1975 .
[216] R. Adamczak,et al. Quantitative estimates of the convergence of the empirical covariance matrix in log-concave ensembles , 2009, 0903.2323.
[217] Karthik Sridharan,et al. Learning with Square Loss: Localization through Offset Rademacher Complexity , 2015, COLT.
[218] V. Vovk. Competitive On‐line Statistics , 2001 .
[219] Stefan Wager,et al. Adaptive Concentration of Regression Trees, with Application to Random Forests , 2015 .
[220] Felipe Cucker,et al. Best Choices for Regularization Parameters in Learning Theory: On the Bias—Variance Problem , 2002, Found. Comput. Math..
[221] Jaouad Mourtada. Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices , 2019 .
[222] Vee Ming Ng,et al. On the estimation of parametric density functions , 1980 .
[223] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .
[224] Luc Devroye,et al. On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification , 2010, J. Multivar. Anal..
[225] Stefan Wager,et al. Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.
[226] P. Massart,et al. Minimum contrast estimators on sieves: exponential bounds and rates of convergence , 1998 .
[227] Alex M. Andrew,et al. Boosting: Foundations and Algorithms , 2012 .
[228] E. Wigner. On the Distribution of the Roots of Certain Symmetric Matrices , 1958 .
[229] Ullrich Köthe,et al. On Oblique Random Forests , 2011, ECML/PKDD.
[230] Erwan Scornet,et al. A random forest guided tour , 2015, TEST.
[231] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.
[232] Hemant Ishwaran,et al. Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.
[233] Olivier Wintenberger,et al. Optimal learning with Bernstein online aggregation , 2014, Machine Learning.
[234] Jean-Yves Audibert,et al. Progressive mixture rules are deviation suboptimal , 2007, NIPS.
[235] J. Rissanen,et al. ON SEQUENTIALLY NORMALIZED MAXIMUM LIKELIHOOD MODELS , 2008 .
[236] H. Chipman,et al. BART: Bayesian Additive Regression Trees , 2008, 0806.3286.
[237] Yurii Nesterov,et al. Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.
[238] H. White. Maximum Likelihood Estimation of Misspecified Models , 1982 .
[239] Robin Genuer,et al. Variance reduction in purely random forests , 2012 .
[240] T. O’Neil. Geometric Measure Theory , 2002 .
[241] C. Esseen. On the Kolmogorov-Rogozin inequality for the concentration function , 1966 .
[242] Wouter M. Koolen,et al. Learning the Learning Rate for Prediction with Expert Advice , 2014, NIPS.
[243] S. Smale,et al. Learning Theory Estimates via Integral Operators and Their Approximations , 2007 .
[244] T. Tao. Topics in Random Matrix Theory , 2012 .
[245] Jean-Philippe Vert,et al. Consistency of Random Forests , 2014, 1405.2881.
[246] G. D. Murray,et al. NOTE ON ESTIMATION OF PROBABILITY DENSITY FUNCTIONS , 1977 .
[247] Matthias W. Seeger,et al. PAC-Bayesian Generalisation Error Bounds for Gaussian Process Classification , 2003, J. Mach. Learn. Res..
[248] R. Dudley. A course on empirical processes , 1984 .
[249] Tamás Linder,et al. Efficient Tracking of Large Classes of Experts , 2011, IEEE Transactions on Information Theory.
[250] Luc Devroye,et al. Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..
[251] Lee H. Dicker,et al. Ridge regression and asymptotic minimax estimation over spheres of growing dimension , 2016, 1601.03900.
[252] S. Geer,et al. On higher order isotropy conditions and lower bounds for sparse quadratic forms , 2014, 1405.5995.
[253] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..
[254] Manfred K. Warmuth,et al. Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.
[255] Anne-Laure Boulesteix,et al. Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..
[256] Larry Wasserman,et al. All of Nonparametric Statistics (Springer Texts in Statistics) , 2006 .
[257] Senén Barro,et al. Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..
[258] Don R. Hush,et al. Optimal Rates for Regularized Least Squares Regression , 2009, COLT.
[259] Fedor Zhdanov,et al. Prediction with Expert Advice under Discounted Loss , 2010, ALT.
[260] K. Hornik,et al. party : A Laboratory for Recursive Partytioning , 2009 .
[261] Vladimir Vovk,et al. Prediction with Advice of Unknown Number of Experts , 2010, UAI.
[262] A. Juditsky,et al. Learning by mirror averaging , 2005, math/0511468.
[263] C. J. Stone,et al. Consistent Nonparametric Regression , 1977 .
[264] Yoram Singer,et al. Using and combining predictors that specialize , 1997, STOC '97.
[265] Jean-Yves Audibert. Fast learning rates in statistical inference through aggregation , 2007, math/0703854.
[266] Aleksandrs Slivkins,et al. 25th Annual Conference on Learning Theory The Best of Both Worlds: Stochastic and Adversarial Bandits , 2022 .
[267] M. Talagrand. Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems , 2014 .
[268] Jorma Rissanen,et al. Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.
[269] Adrien Saumard. On optimality of empirical risk minimization in linear aggregation , 2016, Bernoulli.
[270] Julian Zimmert,et al. Tsallis-INF: An Optimal Algorithm for Stochastic and Adversarial Bandits , 2018, J. Mach. Learn. Res..
[271] Yu-Hsien Peng. On Singular Values of Random Matrices , 2015 .
[272] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .
[273] H. P.. Annales de l'Institut Henri Poincaré , 1931, Nature.
[274] Alessandro Rudi,et al. Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance , 2019, COLT.
[275] T. N. Sriram. Asymptotics in Statistics–Some Basic Concepts , 2002 .
[276] Arthur E. Hoerl,et al. Application of ridge analysis to regression problems , 1962 .
[277] Gilles Stoltz,et al. A second-order bound with excess losses , 2014, COLT.
[278] J. Berkson. Application of the Logistic Function to Bio-Assay , 1944 .
[279] Ron Meir,et al. Generalization Error Bounds for Bayesian Mixture Algorithms , 2003, J. Mach. Learn. Res..
[280] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[281] A. Barron,et al. Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .
[282] Ian R. Harris. Predictive fit for natural exponential families , 1989 .
[283] C. J. Stone,et al. Additive Regression and Other Nonparametric Models , 1985 .
[284] Alexandre B. Tsybakov,et al. Optimal Rates of Aggregation , 2003, COLT.
[285] O. Catoni. Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.
[286] Haipeng Luo,et al. Logistic Regression: The Importance of Being Improper , 2018, COLT.
[287] David Haussler,et al. Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.
[288] Stéphane Gaïffas,et al. An improper estimator with optimal excess risk in misspecified density estimation and logistic regression , 2019, ArXiv.
[289] Gérard Biau,et al. Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..
[290] H. Chipman,et al. Bayesian CART Model Search , 1998 .
[291] Ameet Talwalkar,et al. Foundations of Machine Learning , 2012, Adaptive computation and machine learning.
[292] V. Rocková,et al. Posterior Concentration for Bayesian Regression Trees and their Ensembles , 2017 .
[293] Julian Zimmert,et al. Beating Stochastic and Adversarial Semi-bandits Optimally and Simultaneously , 2019, ICML.
[294] E. Rowland. Theory of Games and Economic Behavior , 1946, Nature.
[295] Gábor Lugosi,et al. Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.
[296] Stéphane Gaïffas,et al. Universal consistency and minimax rates for online Mondrian Forests , 2017, NIPS.
[297] David A. McAllester. Some PAC-Bayesian Theorems , 1998, COLT' 98.
[298] C. Tracy,et al. Introduction to Random Matrices , 1992, hep-th/9210073.
[299] Dean Phillips Foster. Prediction in the Worst Case , 1991 .
[300] Shahar Mendelson,et al. Learning without Concentration , 2014, COLT.
[301] J. Berger. Statistical Decision Theory and Bayesian Analysis , 1988 .
[302] Ohad Shamir,et al. Learnability, Stability and Uniform Convergence , 2010, J. Mach. Learn. Res..
[303] Mihaela Aslan,et al. Asymptotically minimax Bayes predictive densities , 2006, 0708.0177.
[304] P. Bartlett,et al. Local Rademacher complexities , 2005, math/0508275.
[305] Lorenzo Rosasco,et al. Learning with Incremental Iterative Regularization , 2014, NIPS.
[306] Edward I. George,et al. Admissible predictive density estimation , 2008 .
[307] Y. Shtarkov,et al. Sequential Weighting Algorithms for Multi-Alphabet Sources ∗ , 1993 .
[308] M. Rudelson,et al. Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.
[309] Jian-Feng Yao,et al. Convergence Rates of Spectral Distributions of Large Sample Covariance Matrices , 2003, SIAM J. Matrix Anal. Appl..
[310] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[311] S. Mendelson,et al. Performance of empirical risk minimization in linear aggregation , 2014, 1402.5763.
[312] R. Adamczak,et al. Sharp bounds on the rate of convergence of the empirical covariance matrix , 2010, 1012.0294.
[313] Peter L. Bartlett,et al. Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families , 2013, COLT.
[314] Sanjoy Dasgupta,et al. Random projection trees and low dimensional manifolds , 2008, STOC.
[315] V. Koltchinskii,et al. Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .
[316] Wouter M. Koolen,et al. Second-order Quantile Methods for Experts and Combinatorial Games , 2015, COLT.
[317] David A. McAllester. Simplified PAC-Bayesian Margin Bounds , 2003, COLT.
[318] Andrea Montanari,et al. High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.
[319] Frans M. J. Willems,et al. The Context-Tree Weighting Method : Extensions , 1998, IEEE Trans. Inf. Theory.
[320] Michael R. Kosorok,et al. Some asymptotic results of survival tree and forest models , 2017 .
[321] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .
[322] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.
[323] J. Picard,et al. Statistical learning theory and stochastic optimization : École d'eté de probabilités de Saint-Flour XXXI - 2001 , 2004 .
[324] R. Bhatia. Positive Definite Matrices , 2007 .
[325] L. Breiman. CONSISTENCY FOR A SIMPLE MODEL OF RANDOM FORESTS , 2004 .
[326] Boris Polyak,et al. Acceleration of stochastic approximation by averaging , 1992 .
[327] Frans M. J. Willems,et al. Coding for a binary independent piecewise-identically-distributed source , 1996, IEEE Trans. Inf. Theory.
[328] Adrian F. M. Smith,et al. A Bayesian CART algorithm , 1998 .
[329] John K Kruschke,et al. Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.
[330] David Haussler,et al. How to use expert advice , 1993, STOC.
[331] Haipeng Luo,et al. A Drifting-Games Analysis for Online Learning and Applications to Boosting , 2014, NIPS.
[332] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.
[333] Yoav Freund,et al. A Parameter-free Hedging Algorithm , 2009, NIPS.
[334] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[335] Wouter M. Koolen,et al. Minimax Fixed-Design Linear Regression , 2015, COLT.