Distribution-Free Robust Linear Regression

We study random design linear regression with no assumptions on the distribution of the covariates and with a heavy-tailed response variable. In this distribution-free regression setting, we show that boundedness of the conditional second moment of the response given the covariates is a necessary and sufficient condition for achieving nontrivial guarantees. As a starting point, we prove an optimal version of the classical in-expectation bound for the truncated least squares estimator due to Györfi, Kohler, Krzyżak, and Walk. However, we show that this procedure fails with constant probability for some distributions despite its optimal in-expectation performance. Then, combining the ideas of truncated least squares, median-of-means procedures, and aggregation theory, we construct a non-linear estimator achieving excess risk of order d/n with an optimal sub-exponential tail. While existing approaches to linear regression for heavy-tailed distributions focus on proper estimators that return linear functions, we highlight that the improperness of our procedure is necessary for attaining nontrivial guarantees in the distribution-free setting. MSC2020 Subject Classifications: 62J05; 62G35; 68Q32.

[1]  Shahar Mendelson,et al.  Learning without Concentration , 2014, COLT.

[2]  Matthieu Lerasle,et al.  ROBUST MACHINE LEARNING BY MEDIAN-OF-MEANS: THEORY AND PRACTICE , 2019 .

[3]  R. Nickl,et al.  Mathematical Foundations of Infinite-Dimensional Statistical Models , 2015 .

[4]  Daniel Z. Zanger Quantitative error estimates for a least-squares Monte Carlo algorithm for American option pricing , 2013, Finance Stochastics.

[5]  Michael I. Jordan,et al.  Optimal Robust Linear Regression in Nearly Linear Time , 2020, ArXiv.

[6]  O. Catoni,et al.  Dimension-free PAC-Bayesian bounds for matrices, vectors, and linear least squares regression , 2017, 1712.02747.

[7]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[8]  A. Juditsky,et al.  Learning by mirror averaging , 2005, math/0511468.

[9]  Stanislav Minsker Geometric median and robust estimation in Banach spaces , 2013, 1308.1334.

[10]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[11]  Jaouad Mourtada Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices , 2019 .

[12]  S. Mendelson,et al.  Performance of empirical risk minimization in linear aggregation , 2014, 1402.5763.

[13]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[14]  Regression function estimation on non compact support in an heteroscesdastic model , 2020, Metrika.

[15]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[16]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[17]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[18]  F. Bach,et al.  Non-parametric Stochastic Approximation with Large Step sizes , 2014, 1408.0361.

[19]  Noga Alon,et al.  The space complexity of approximating the frequency moments , 1996, STOC '96.

[20]  J. Picard,et al.  Statistical learning theory and stochastic optimization : École d'eté de probabilités de Saint-Flour XXXI - 2001 , 2004 .

[21]  Soumendu Sundar Mukherjee,et al.  Weak convergence and empirical processes , 2019 .

[22]  Albert Cohen,et al.  On the Stability and Accuracy of Least Squares Approximations , 2011, Foundations of Computational Mathematics.

[23]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[24]  Emmanuel Gobet,et al.  Monte-Carlo Methods and Stochastic Processes: From Linear to Non-Linear , 2016 .

[25]  Adrien Saumard On optimality of empirical risk minimization in linear aggregation , 2016, Bernoulli.

[26]  Yuhong Yang Mixing Strategies for Density Estimation , 2000 .

[27]  Jean-Yves Audibert,et al.  Robust linear least squares regression , 2010, 1010.0074.

[28]  M. Ledoux,et al.  Conditions D'Integrabilite Pour Les Multiplicateurs Dans le TLC Banachique , 1986 .

[29]  S. Mendelson On aggregation for heavy-tailed classes , 2015, Probability Theory and Related Fields.

[30]  Werner A. Stahel,et al.  Robust Statistics: The Approach Based on Influence Functions , 1987 .

[31]  O. Catoni Challenging the empirical mean and empirical variance: a deviation study , 2010, 1009.2048.

[32]  Nikita Zhivotovskiy,et al.  Suboptimality of Constrained Least Squares and Improvements via Non-Linear Predictors , 2020, Bernoulli.

[33]  S. Mendelson,et al.  Aggregation via empirical risk minimization , 2009 .

[34]  S. R. Jammalamadaka,et al.  Empirical Processes in M-Estimation , 2001 .

[35]  L. Györfi,et al.  On the Averaged Stochastic Approximation for Linear Regression , 1996 .

[36]  Harro Walk,et al.  Convergence of the Robbins-Monro method for linear problems in a Banach space , 1989 .

[37]  G. Lugosi,et al.  Sub-Gaussian estimators of the mean of a random vector , 2017, The Annals of Statistics.

[38]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[39]  Karthik Sridharan,et al.  Empirical Entropy, Minimax Regret and Minimax Risk , 2013, ArXiv.

[40]  Manfred K. Warmuth,et al.  Relative Expected Instantaneous Loss Bounds , 2000, J. Comput. Syst. Sci..

[41]  Shahar Mendelson,et al.  Improving the sample complexity using global data , 2002, IEEE Trans. Inf. Theory.

[42]  Yuhong Yang Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[43]  A. Caponnetto,et al.  Optimal Rates for the Regularized Least-Squares Algorithm , 2007, Found. Comput. Math..

[44]  Roberto Imbuzeiro Oliveira,et al.  The lower tail of random quadratic forms with applications to ordinary least squares , 2013, ArXiv.

[45]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[46]  Jean-Yves Audibert,et al.  Linear regression through PAC-Bayesian truncation , 2010, 1010.0072.

[47]  Daniel M. Kane,et al.  Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[48]  V. Vovk Competitive On‐line Statistics , 2001 .

[49]  G. Lugosi,et al.  Near-optimal mean estimators with respect to general norms , 2018, Probability Theory and Related Fields.

[50]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[51]  Po-Ling Loh,et al.  Robust regression with covariate filtering: Heavy tails and adversarial contamination , 2020, ArXiv.

[52]  Shahar Mendelson Learning bounded subsets of Lp , 2020, ArXiv.

[53]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[54]  J. Wellner,et al.  Convergence rates of least squares regression estimators with heavy-tailed errors , 2017, The Annals of Statistics.

[55]  M. Lerasle,et al.  Robust statistical learning with Lipschitz and convex loss functions , 2019, Probability Theory and Related Fields.

[56]  G. Lugosi,et al.  Robust multivariate mean estimation: The optimality of trimmed mean , 2019, The Annals of Statistics.

[57]  Emmanuel Gobet,et al.  Adaptive importance sampling in least-squares Monte Carlo algorithms for backward stochastic differential equations , 2017 .

[58]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[59]  G. Lugosi,et al.  Empirical risk minimization for heavy-tailed losses , 2014, 1406.2462.

[60]  Ohad Shamir,et al.  The sample complexity of learning linear predictors with the squared loss , 2014, J. Mach. Learn. Res..

[61]  O. Catoni PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design , 2016, 1603.05229.

[62]  Shahar Mendelson,et al.  An Unrestricted Learning Procedure , 2019, J. ACM.

[63]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[64]  Shahar Mendelson,et al.  Mean Estimation and Regression Under Heavy-Tailed Distributions: A Survey , 2019, Found. Comput. Math..

[65]  Alessandro Rudi,et al.  Affine Invariant Covariance Estimation for Heavy-Tailed Distributions , 2019, COLT.

[66]  J. Tukey A survey of sampling from contaminated distributions , 1960 .

[67]  Daniel J. Hsu,et al.  Loss Minimization and Parameter Estimation with Heavy Tails , 2013, J. Mach. Learn. Res..

[68]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[69]  Stanislav Minsker,et al.  Excess risk bounds in robust empirical risk minimization , 2019, Information and Inference: A Journal of the IMA.

[70]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[71]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[72]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[73]  G. Lugosi,et al.  Risk minimization by median-of-means tournaments , 2016, Journal of the European Mathematical Society.

[74]  Belomestny Denis,et al.  Advanced Simulation-Based Methods for Optimal Stopping and Control: With Applications in Finance , 2018 .

[75]  Leslie G. Valiant,et al.  Random Generation of Combinatorial Structures from a Uniform Distribution , 1986, Theor. Comput. Sci..

[76]  Shahar Mendelson,et al.  Extending the scope of the small-ball method , 2017, Studia Mathematica.

[77]  Jean-Yves Audibert,et al.  Progressive mixture rules are deviation suboptimal , 2007, NIPS.

[78]  Nikita Zhivotovskiy,et al.  Robust k-means Clustering for Distributions with Two Moments , 2020, The Annals of Statistics.

[79]  D. Freedman,et al.  How Many Variables Should Be Entered in a Regression Equation , 1983 .

[80]  Mendelson Shahar,et al.  Robust covariance estimation under L_4-L_2 norm equivalence , 2018 .

[81]  G. Lugosi,et al.  Sub-Gaussian mean estimators , 2015, 1509.05845.

[82]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.