Quantile regression for large-scale data via sparse exponential transform method

ABSTRACT In recent decades, quantile regression has received much more attention from academics and practitioners. However, most of existing computational algorithms are only effective for small or moderate size problems. They cannot solve quantile regression with large-scale data reliably and efficiently. To this end, we propose a new algorithm to implement quantile regression on large-scale data using the sparse exponential transform (SET) method. This algorithm mainly constructs a well-conditioned basis and a sampling matrix to reduce the number of observations. It then solves a quantile regression problem on this reduced matrix and obtains an approximate solution. Through simulation studies and empirical analysis of a 5% sample of the US 2000 Census data, we demonstrate efficiency of the SET-based algorithm. Numerical results indicate that our new algorithm is effective in terms of computation time and performs well for large-scale quantile regression.

[1]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[2]  R. Koenker,et al.  Reappraising Medfly Longevity , 2001 .

[3]  Edwin P. D. Pednault,et al.  Segmented Regression Estimators for Massive Data Sets , 2002, SDM.

[4]  K. Clarkson Subgradient and sampling algorithms for l1 regression , 2005, SODA '05.

[5]  Xuming He,et al.  Practical Confidence Intervals for Regression Quantiles , 2005 .

[6]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[7]  Xuming He,et al.  Detecting Differential Expressions in GeneChip Microarray Studies , 2007 .

[8]  Tsai-Hung Fan,et al.  Regression analysis for massive datasets , 2007, Data Knowl. Eng..

[9]  R. Koenker,et al.  Regression Quantiles , 2007 .

[10]  Anirban Dasgupta,et al.  Sampling algorithms and coresets for ℓp regression , 2007, SODA '08.

[11]  Limin Peng,et al.  Survival Analysis With Quantile Regression Models , 2008 .

[12]  David P. Woodruff,et al.  Subspace embeddings for the L1-norm with applications , 2011, STOC '11.

[13]  J Wagner,et al.  The exporter productivity premium along the productivity distribution: evidence from quantile regression with nonadditive firm fixed effects , 2014, Microeconometric Studies of Firms' Imports and Exports.

[14]  K. Okada,et al.  The effect of foreign aid on corruption: A quantile regression approach , 2012 .

[15]  Michael W. Mahoney,et al.  Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression , 2012, STOC '13.

[16]  David P. Woodruff,et al.  Subspace Embeddings and \(\ell_p\)-Regression Using Exponential Random Variables , 2013, COLT.

[17]  J. Bearak,et al.  Is the Motherhood Penalty Larger for Low-Wage Women? A Comment on Quantile Regression , 2014 .

[18]  L. Briollais,et al.  Application of quantile regression to recent genetic and -omic studies , 2014, Human Genetics.

[19]  L. Cooke Gendered parenthood penalties and premiums across the earnings distribution in Australia, the United Kingdom, and the United States , 2014 .

[20]  Michael W. Mahoney,et al.  Quantile Regression for Large-Scale Applications , 2013, SIAM J. Sci. Comput..

[21]  Rahim Alhamzawi,et al.  Bayesian Lasso-mixed quantile regression , 2014 .

[22]  X. Niu,et al.  The Phillips curve in the US: A nonlinear quantile regression approach , 2015 .

[23]  S. MacEachern,et al.  Efficient quantile regression for heteroscedastic models , 2014, Journal of Statistical Computation and Simulation.

[24]  David P. Woodruff,et al.  The Fast Cauchy Transform and Faster Robust Linear Regression , 2012, SIAM J. Comput..

[25]  Gabriela Ciuperca Adaptive LASSO model selection in a multiphase quantile regression , 2013, 1309.1262.

[26]  Hu Yang,et al.  Quantile regression for robust estimation and variable selection in partially linear varying-coefficient models , 2017 .