Householder Sketch for Accurate and Accelerated Least-Mean-Squares Solvers

Least-Mean-Squares (LMS) solvers comprise a class of fundamental optimization problems such as linear regression, and regularized regressions such as Ridge, LASSO, and Elastic-Net. Data summarization techniques for big data generate summaries called coresets and sketches to speed up model learning under streaming and distributed settings. For example, (Maalouf et al., 2019) design a fast and accurate Caratheodory set on input data to boost the performance of existing LMS solvers. In retrospect, we explore classical Householder transformation as a candidate for sketching and accurately solving LMS problems. We find it to be a simpler, memory-efficient, and faster alternative that always existed to the above strong baseline. We also present a scalable algorithm based on the construction of distributed Householder sketches to solve LMS problem across multiple worker nodes. We perform thorough empirical analysis with large synthetic and real datasets to evaluate the performance of Householder sketch and compare with (Maalouf et al., 2019). Our results show Householder sketch speeds up existing LMS solvers in the scikit-learn library up to 100x400x. Also, it is 10x-100x faster than the above baseline with similar numerical stability. The distributed algorithm demonstrates linear scalability with a near-negligible communication overhead.

[1]  Jyotikrishna Dass,et al.  Fast and Communication-Efficient Algorithm for Distributed Support Vector Machine Training , 2019, IEEE Transactions on Parallel and Distributed Systems.

[2]  Rinshu Dwivedi,et al.  The incubation period of coronavirus disease (COVID‐19): A tremendous public health threat—Forecasting from publicly available case data in India , 2021, Journal of public affairs.

[3]  G. D'Angelo,et al.  Combining least absolute shrinkage and selection operator (LASSO) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies , 2009, BMC proceedings.

[4]  C. Carathéodory Über den Variabilitätsbereich der Koeffizienten von Potenzreihen, die gegebene Werte nicht annehmen , 1907 .

[5]  Ibrahim Jubran,et al.  Introduction to Coresets: Accurate Coresets , 2019, ArXiv.

[6]  Hannah R. Meredith,et al.  The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application , 2020, Annals of Internal Medicine.

[7]  Xiaodong Lin,et al.  Privacy preserving regression modelling via distributed computation , 2004, KDD.

[8]  Michael A. Saunders,et al.  LSRN: A Parallel Iterative Solver for Strongly Over- or Underdetermined Systems , 2011, SIAM J. Sci. Comput..

[9]  David P. Woodruff,et al.  Numerical linear algebra in the streaming model , 2009, STOC '09.

[10]  Jeff M. Phillips,et al.  Coresets and Sketches , 2016, ArXiv.

[11]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[12]  James Demmel,et al.  Reconstructing Householder Vectors from Tall-Skinny QR , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[13]  David P. Woodruff,et al.  Coresets and sketches for high dimensional subspace approximation problems , 2010, SODA '10.

[14]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[15]  M. Rozložník Numerics of Gram-Schmidt orthogonalization , 2007 .

[16]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[17]  S. Muthukrishnan,et al.  Sampling algorithms for l2 regression and applications , 2006, SODA '06.

[18]  A. Kidd,et al.  Survival prediction in mesothelioma using a scalable Lasso regression model: instructions for use and initial performance using clinical predictors , 2018, BMJ Open Respiratory Research.

[19]  Christian S. Jensen,et al.  Building Accurate 3D Spatial Networks to Enable Next Generation Intelligent Transportation Systems , 2013, 2013 IEEE 14th International Conference on Mobile Data Management.

[20]  R. Ho,et al.  Immediate Psychological Responses and Associated Factors during the Initial Stage of the 2019 Coronavirus Disease (COVID-19) Epidemic among the General Population in China , 2020, International journal of environmental research and public health.

[21]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[22]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[23]  G. Pandey,et al.  SEIR and Regression Model based COVID-19 outbreak predictions in India , 2020, medRxiv.

[24]  Christian H. Bischof,et al.  The WY representation for products of householder matrices , 1985, PPSC.

[25]  Michael A. Saunders,et al.  LSMR: An Iterative Algorithm for Sparse Least-Squares Problems , 2010, SIAM J. Sci. Comput..

[26]  Ibrahim Jubran,et al.  Fast and Accurate Least-Mean-Squares Solvers for High Dimensional Data , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Michael B. Miller Linear Regression Analysis , 2013 .

[28]  A. George,et al.  Solution of sparse linear least squares problems using givens rotations , 1980 .

[29]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[30]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[31]  Sivan Toledo,et al.  Blendenpik: Supercharging LAPACK's Least-Squares Solver , 2010, SIAM J. Sci. Comput..

[32]  Robert A. van de Geijn,et al.  Elemental: A New Framework for Distributed Memory Dense Matrix Computations , 2013, TOMS.

[33]  G. Stewart,et al.  A block QR algorithm and the singular value decomposition , 1993 .

[34]  Ion Stoica,et al.  Helen: Maliciously Secure Coopetitive Learning for Linear Models , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[35]  AvronHaim,et al.  Blendenpik: Supercharging LAPACK's Least-Squares Solver , 2010 .

[36]  T. Stengos,et al.  On the determinants of bitcoin returns: A LASSO approach , 2018, Finance Research Letters.

[37]  Mariana Raykova,et al.  Privacy-Preserving Distributed Linear Regression on High-Dimensional Data , 2017, Proc. Priv. Enhancing Technol..

[38]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[39]  Inderjit S. Dhillon,et al.  Memory Efficient Kernel Approximation , 2014, ICML.

[40]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[41]  Avi Wigderson,et al.  Completeness theorems for non-cryptographic fault-tolerant distributed computation , 1988, STOC '88.

[42]  Sjsu ScholarWorks,et al.  Rank revealing QR factorizations , 2014 .