Nearly Linear Row Sampling Algorithm for Quantile Regression

We give a row sampling algorithm for the quantile loss function with sample complexity nearly linear in the dimensionality of the data, improving upon the previous best algorithm whose sampling complexity has at least cubic dependence on the dimensionality. Based upon our row sampling algorithm, we give the fastest known algorithm for quantile regression and a graph sparsification algorithm for balanced directed graphs. Our main technical contribution is to show that Lewis weights sampling, which has been used in row sampling algorithms for $\ell_p$ norms, can also be applied in row sampling algorithms for a variety of loss functions. We complement our theoretical results by experiments to demonstrate the practicality of our approach.

[1]  Limin Peng,et al.  HIGH DIMENSIONAL CENSORED QUANTILE REGRESSION. , 2018, Annals of statistics.

[2]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[3]  David P. Woodruff,et al.  Low rank approximation and regression in input sparsity time , 2012, STOC '13.

[4]  Heng Lian,et al.  Quantile regression for the single-index coefficient model , 2017 .

[5]  Kevin A. Lai,et al.  L1 Regression using Lewis Weights Preconditioning and Stochastic Gradient Descent , 2017, COLT 2018.

[6]  David P. Woodruff,et al.  The Fast Cauchy Transform and Faster Robust Linear Regression , 2012, SIAM J. Comput..

[7]  David P. Woodruff Sketching as a Tool for Numerical Linear Algebra , 2014, Found. Trends Theor. Comput. Sci..

[8]  P. Chaudhuri,et al.  Nonparametric depth and quantile regression for functional data , 2016, Bernoulli.

[9]  T J Cole,et al.  Smoothing reference centile curves: the LMS method and penalized likelihood. , 1992, Statistics in medicine.

[10]  Richard Peng,et al.  Uniform Sampling for Matrix Approximation , 2014, ITCS.

[11]  Naomi S. Altman,et al.  Quantile regression , 2019, Nature Methods.

[12]  David P. Woodruff,et al.  Input Sparsity and Hardness for Robust Subspace Approximation , 2015, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science.

[13]  Hirofumi Ohta,et al.  Quantile regression approach to conditional mode estimation , 2018, Electronic Journal of Statistics.

[14]  V. Nguyen,et al.  A comparative study of regression based methods in regional flood frequency analysis , 1999 .

[15]  Richard Peng,et al.  Lp Row Sampling by Lewis Weights , 2015, STOC.

[16]  K. Clarkson Subgradient and sampling algorithms for l1 regression , 2005, SODA '05.

[17]  Ruosong Wang,et al.  Efficient Symmetric Norm Regression via Linear Sketching , 2019, NeurIPS.

[18]  Ruosong Wang,et al.  Dimensionality Reduction for Tukey Regression , 2019, ICML.

[19]  Jakub W. Pachocki,et al.  Routing under balance , 2016, STOC.

[20]  P Royston,et al.  Goodness-of-fit statistics for age-specific reference intervals. , 2000, Statistics in medicine.

[21]  Kevin A. Lai,et al.  ` 1 Regression using Lewis Weights Preconditioning and Stochastic Gradient Descent , 2018 .

[22]  Gideon Schechtman Of Rehovot,et al.  Embedding Subspaces of L P Intò N P , 0 < P < 1 , 1999 .

[23]  M. Ledoux,et al.  Comparison Theorems, Random Geometry and Some Limit Theorems for Empirical Processes , 1989 .

[24]  David P. Woodru Sketching as a Tool for Numerical Linear Algebra , 2014 .

[25]  David P. Woodruff,et al.  Sketching for M-Estimators: A Unified Approach to Robust Regression , 2015, SODA.

[26]  David P. Woodruff,et al.  Subspace embeddings for the L1-norm with applications , 2011, STOC '11.

[27]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[28]  Virginia Vassilevska Williams,et al.  Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[29]  David P. Woodruff,et al.  Subspace Embeddings and \(\ell_p\)-Regression Using Exponential Random Variables , 2013, COLT.

[30]  Michael W. Mahoney,et al.  Quantile Regression for Large-Scale Applications , 2013, SIAM J. Sci. Comput..

[31]  Petros Drineas,et al.  Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving , 2010, ArXiv.

[32]  Ruosong Wang,et al.  Tight Bounds for the Subspace Sketch Problem with Applications , 2019, SODA.

[33]  Shin-ichi Tanigawa,et al.  Cut Sparsifiers for Balanced Digraphs , 2018, WAOA.

[34]  Cristina Davino,et al.  Quantile Regression: Theory and Applications , 2013 .

[35]  R. Koenker,et al.  Regression Quantiles , 2007 .

[36]  Huy L. Nguyen,et al.  OSNAP: Faster Numerical Linear Algebra Algorithms via Sparser Subspace Embeddings , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[37]  J. Lindenstrauss,et al.  Approximation of zonoids by zonotopes , 1989 .

[38]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[39]  Joel A. Tropp,et al.  User-Friendly Tail Bounds for Sums of Random Matrices , 2010, Found. Comput. Math..

[40]  Ruosong Wang,et al.  Tight Bounds for 𝓁p Oblivious Subspace Embeddings , 2019, ACM-SIAM Symposium on Discrete Algorithms.

[41]  Gary L. Miller,et al.  Iterative Row Sampling , 2012, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[42]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[43]  Richard Peng,et al.  ℓp Row Sampling by Lewis Weights , 2014, ArXiv.

[44]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[45]  P. Royston,et al.  Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. , 1994 .

[46]  Xi Chen,et al.  Quantile regression under memory constraint , 2018, The Annals of Statistics.