A Precise High-Dimensional Asymptotic Theory for Boosting and Min-L1-Norm Interpolated Classifiers
暂无分享,去创建一个
[1] Thomas M. Cover,et al. Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..
[2] A. Albert,et al. On the existence of maximum likelihood estimates in logistic regression models , 1984 .
[3] Y. Gordon. Some inequalities for Gaussian processes and applications , 1985 .
[4] Thomas J. Santner,et al. A note on A. Albert and J. A. Anderson's conditions for the existence of maximum likelihood estimates in logistic regression models , 1986 .
[5] E. Gardner. The space of interactions in neural network models , 1988 .
[6] Y. Gordon. On Milman's inequality and random subspaces which escape through a mesh in ℝ n , 1988 .
[7] Emmanuel Lesaffre,et al. Partial Separation in Logistic Discrimination , 1989 .
[8] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1995, COLT '90.
[9] Yoav Freund,et al. Boosting a weak learning algorithm by majority , 1990, COLT '90.
[10] Corinna Cortes,et al. Boosting Decision Trees , 1995, NIPS.
[11] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[12] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[13] J. Ross Quinlan,et al. Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.
[14] Leo Breiman,et al. Bias, Variance , And Arcing Classifiers , 1996 .
[15] Yoav Freund,et al. Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.
[16] Dale Schuurmans,et al. Boosting in the Limit: Maximizing the Margin of Learned Ensembles , 1998, AAAI/IAAI.
[17] L. Breiman. Arcing Classifiers , 1998 .
[18] Peter L. Bartlett,et al. Boosting Algorithms as Gradient Descent , 1999, NIPS.
[19] Leo Breiman,et al. Prediction Games and Arcing Algorithms , 1999, Neural Computation.
[20] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .
[21] J. Friedman. Greedy function approximation: A gradient boosting machine. , 2001 .
[22] B. Yu,et al. Boosting with the L 2-loss regression and classification , 2001 .
[23] Shie Mannor,et al. Geometric Bounds for Generalization in Boosting , 2001, COLT/EuroCOLT.
[24] Wenxin Jiang,et al. Some Theoretical Aspects of Boosting in the Presence of Noisy Data , 2001, ICML.
[25] M. Shcherbina,et al. Rigorous Solution of the Gardner Problem , 2001, math-ph/0112003.
[26] V. Koltchinskii,et al. Empirical margin distributions and bounding the generalization error of combined classifiers , 2002, math/0405343.
[27] Shie Mannor,et al. The Consistency of Greedy Algorithms for Classification , 2002, COLT.
[28] P. Bühlmann,et al. Boosting With the L2 Loss , 2003 .
[29] Wenxin Jiang. Process consistency for AdaBoost , 2003 .
[30] Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization , 2003 .
[31] G. Lugosi,et al. On the Bayes-risk consistency of regularized boosting methods , 2003 .
[32] Gilles Blanchard,et al. On the Rate of Convergence of Regularized Boosting Classifiers , 2003, J. Mach. Learn. Res..
[33] L. Breiman. Population theory for boosting ensembles , 2003 .
[34] Shie Mannor,et al. On the Existence of Linear Weak Learners and Applications to Boosting , 2002, Machine Learning.
[35] Gunnar Rätsch,et al. Soft Margins for AdaBoost , 2001, Machine Learning.
[36] Yoram Singer,et al. Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.
[37] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..
[38] Gunnar Rätsch,et al. Efficient Margin Maximizing with Boosting , 2005, J. Mach. Learn. Res..
[39] R. Schapire. The Strength of Weak Learnability , 1990, Machine Learning.
[40] Bin Yu,et al. Boosting with early stopping: Convergence and consistency , 2005, math/0508276.
[41] V. Koltchinskii,et al. Complexities of convex combinations and bounding the generalization error in classification , 2004, math/0405356.
[42] Vladimir Koltchinskii,et al. Exponential Convergence Rates in Classification , 2005, COLT.
[43] Robert E. Schapire,et al. How boosting the margin can also boost classifier complexity , 2006, ICML.
[44] Peter Buhlmann. Boosting for high-dimensional linear models , 2006, math/0606789.
[45] P. Bühlmann,et al. Sparse Boosting , 2006, J. Mach. Learn. Res..
[46] Peter L. Bartlett,et al. AdaBoost is Consistent , 2006, J. Mach. Learn. Res..
[47] P. Bickel,et al. Some Theory for Generalized Boosting Algorithms , 2006, J. Mach. Learn. Res..
[48] Peter Buhlmann,et al. BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.
[49] R. Schapire,et al. Analysis of boosting algorithms using the smooth margin function , 2007, 0803.4092.
[50] AI Koan,et al. Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning , 2008, NIPS.
[51] C. Villani. Optimal Transport: Old and New , 2008 .
[52] Andrea Montanari,et al. Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.
[53] Torsten Hothorn,et al. Twin Boosting: improved feature selection and prediction , 2010, Stat. Comput..
[54] Yoram Singer,et al. On the equivalence of weak learnability and linear separability: new relaxations and efficient boosting algorithms , 2010, Machine Learning.
[55] Cynthia Rudin,et al. The Rate of Convergence of Adaboost , 2011, COLT.
[56] Matus Telgarsky,et al. Margins, Shrinkage, and Boosting , 2013, ICML.
[57] Mihailo Stojnic,et al. A framework to characterize performance of LASSO algorithms , 2013, ArXiv.
[58] L. Ambrosio,et al. A User’s Guide to Optimal Transport , 2013 .
[59] P. Bickel,et al. On robust regression with high-dimensional predictors , 2013, Proceedings of the National Academy of Sciences.
[60] Mihailo Stojnic,et al. Meshes that trap random subspaces , 2013, ArXiv.
[61] Paul Grigas,et al. AdaBoost and Forward Stagewise Regression are First-Order Convex Optimization Methods , 2013, ArXiv.
[62] Andrea Montanari,et al. High dimensional robust M-estimation: asymptotic variance via approximate message passing , 2013, Probability Theory and Related Fields.
[63] Christos Thrampoulidis,et al. A Tight Version of the Gaussian min-max theorem in the Presence of Convexity , 2014, ArXiv.
[64] Stephen P. Boyd,et al. Proximal Algorithms , 2013, Found. Trends Optim..
[65] Paul Grigas,et al. A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives , 2015, ArXiv.
[66] Christos Thrampoulidis,et al. Regularized Linear Regression: A Precise Analysis of the Estimation Error , 2015, COLT.
[67] Alexander Hanbo Li,et al. Boosting in the Presence of Outliers: Adaptive Classification With Nonconvex Loss Functions , 2015, ArXiv.
[68] Christos Thrampoulidis,et al. LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements , 2015, NIPS.
[69] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[70] Babak Hassibi,et al. A Universal Analysis of Large-Scale Regularized Least Squares Solutions , 2017, NIPS.
[71] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..
[72] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[73] Nathan Srebro,et al. Characterizing Implicit Bias in Terms of Optimization Geometry , 2018, ICML.
[74] Tengyuan Liang,et al. Just Interpolate: Kernel "Ridgeless" Regression Can Generalize , 2018, The Annals of Statistics.
[75] Alexandra Chouldechova,et al. The Frontiers of Fairness in Machine Learning , 2018, ArXiv.
[76] Christos Thrampoulidis,et al. Precise Error Analysis of Regularized $M$ -Estimators in High Dimensions , 2016, IEEE Transactions on Information Theory.
[77] E. Candès,et al. The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression , 2018, The Annals of Statistics.
[78] Noureddine El Karoui,et al. On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .
[79] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[80] Zachary Chase Lipton. The mythos of model interpretability , 2016, ACM Queue.
[81] Noureddine El Karoui. On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators , 2018 .
[82] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[83] Christos Thrampoulidis,et al. Phase Retrieval via Polytope Optimization: Geometry, Phase Transitions, and New Algorithms , 2018, ArXiv.
[84] Tengyuan Liang,et al. On the Risk of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels , 2019, ArXiv.
[85] Yuxin Chen,et al. The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square , 2017, Probability Theory and Related Fields.
[86] A. Montanari,et al. The generalization error of max-margin linear classifiers: High-dimensional asymptotics in the overparametrized regime , 2019 .
[87] Hong Hu,et al. Asymptotics and Optimal Designs of SLOPE for Sparse Linear Regression , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).
[88] Christos Thrampoulidis,et al. A Model of Double Descent for High-dimensional Binary Linear Classification , 2019, Information and Inference: A Journal of the IMA.
[89] Adrian Weller,et al. Transparency: Motivations and Challenges , 2019, Explainable AI.
[90] E. Candès,et al. A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.
[91] A. Montanari,et al. MEAN FIELD ASYMPTOTICS IN HIGH-DIMENSIONAL STATISTICS: FROM EXACT RESULTS TO EFFICIENT ALGORITHMS , 2019, Proceedings of the International Congress of Mathematicians (ICM 2018).
[92] Tengyuan Liang,et al. Training Neural Networks as Learning Data-adaptive Kernels: Provable Representation and Approximation Benefits , 2019, Journal of the American Statistical Association.
[93] Jon M. Kleinberg,et al. Simplicity Creates Inequity: Implications for Fairness, Stereotypes, and Interpretability , 2018, EC.
[94] Cynthia Rudin,et al. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.
[95] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[96] Mikhail Belkin,et al. Does data interpolation contradict statistical optimality? , 2018, AISTATS.
[97] Yue M. Lu,et al. Universality Laws for High-Dimensional Learning with Random Features , 2020, ArXiv.
[98] Tengyuan Liang,et al. Mehler’s Formula, Branching Process, and Compositional Kernels of Deep Neural Networks , 2020, Journal of the American Statistical Association.
[99] Tengyuan Liang,et al. On the Multiple Descent of Minimum-Norm Interpolants and Restricted Lower Isometry of Kernels , 2019, COLT.
[100] Andrea Montanari,et al. The Lasso with general Gaussian designs with applications to hypothesis testing , 2020, ArXiv.
[101] Francis Bach,et al. Implicit Bias of Gradient Descent for Wide Two-layer Neural Networks Trained with the Logistic Loss , 2020, COLT.
[102] Emmanuel J. Candes,et al. The asymptotic distribution of the MLE in high-dimensional logistic models: Arbitrary covariance , 2020, Bernoulli.
[103] Manfred K. Warmuth,et al. Winnowing with Gradient Descent , 2020, COLT.
[104] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[105] A. Maleki,et al. Which bridge estimator is the best for variable selection? , 2020 .
[106] Mohamed-Slim Alouini,et al. Precise Error Analysis of the LASSO under Correlated Designs , 2020, ArXiv.
[107] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[108] Florentina Bunea,et al. Interpolation under latent factor regression models , 2020, ArXiv.
[109] Matus Telgarsky,et al. Characterizing the implicit bias via a primal-dual analysis , 2019, ALT.
[110] Andrea Montanari,et al. The Generalization Error of Random Features Regression: Precise Asymptotics and the Double Descent Curve , 2019, Communications on Pure and Applied Mathematics.
[111] Hanwen Huang. LASSO risk and phase transition under dependence , 2021 .
[112] Philip M. Long,et al. Finite-sample analysis of interpolating linear classifiers in the overparameterized regime , 2020, ArXiv.