Fast, Provably convergent IRLS Algorithm for p-norm Linear Regression

Linear regression in $\ell_p$-norm is a canonical optimization problem that arises in several applications, including sparse recovery, semi-supervised learning, and signal processing. Generic convex optimization algorithms for solving $\ell_p$-regression are slow in practice. Iteratively Reweighted Least Squares (IRLS) is an easy to implement family of algorithms for solving these problems that has been studied for over 50 years. However, these algorithms often diverge for p > 3, and since the work of Osborne (1985), it has been an open problem whether there is an IRLS algorithm that is guaranteed to converge rapidly for p > 3. We propose p-IRLS, the first IRLS algorithm that provably converges geometrically for any $p \in [2,\infty).$ Our algorithm is simple to implement and is guaranteed to find a $(1+\varepsilon)$-approximate solution in $O(p^{3.5} m^{\frac{p-2}{2(p-1)}} \log \frac{m}{\varepsilon}) \le O_p(\sqrt{m} \log \frac{m}{\varepsilon} )$ iterations. Our experiments demonstrate that it performs even better than our theoretical bounds, beats the standard Matlab/CVX implementation for solving these problems by 10--50x, and is the fastest among available implementations in the high-accuracy regime.

[1]  L. Goddard Approximation of Functions , 1965, Nature.

[2]  Richard Peng,et al.  Iterative Refinement for ℓp-norm Regression , 2019, SODA.

[3]  Abderrahim Elmoataz,et al.  On the p-Laplacian and ∞-Laplacian on Graphs with Applications in Image and Data Processing , 2015, SIAM J. Imaging Sci..

[4]  Adrian Vladu,et al.  Improved Convergence for and 1 Regression via Iteratively Reweighted Least Squares , 2019 .

[5]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[6]  S. W. Kahng Best L p Approximation , 1972 .

[7]  David P. Woodruff,et al.  Algorithms for ℓp Low Rank Approximation , 2017 .

[8]  M. Dugan Rice , 2015 .

[9]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[10]  Ricardo Arturo Vargas Iterative design of lp digital filters , 2008 .

[11]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[12]  C. Burrus Iterative Reweighted Least Squares ∗ , 2014 .

[13]  Pierre McKenzie,et al.  Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing , 2017, STOC.

[14]  Dejan Slepcev,et al.  Analysis of $p$-Laplacian Regularization in Semi-Supervised Learning , 2017, SIAM J. Math. Anal..

[15]  Yin Tat Lee,et al.  An homotopy method for lp regression provably beyond self-concordance and in input-sparsity time , 2018, STOC.

[16]  Mauricio Flores Rios,et al.  Algorithms for $\ell_p$-based semi-supervised learning on graphs , 2019 .

[17]  Mikhail Belkin,et al.  Semi-supervised Learning by Higher Order Regularization , 2011, AISTATS.

[18]  Richard Peng,et al.  Flows in almost linear time via adaptive preconditioning , 2019, STOC.

[19]  Massimo Fornasier,et al.  Iteratively Re-weighted Least Squares minimization: Proof of faster than linear rate for sparse recovery , 2008, 2008 42nd Annual Conference on Information Sciences and Systems.

[20]  Yee Whye Teh,et al.  Hamiltonian Descent Methods , 2018, ArXiv.

[21]  Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing , 2019, STOC.

[22]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[23]  Deeksha Adil,et al.  Faster p-norm minimizing flows, via smoothed q-norm problems , 2020, SODA.

[24]  Xiaojin Zhu,et al.  p-voltages: Laplacian Regularization for Semi-Supervised Learning on High-Dimensional Data , 2013 .

[25]  C. Sidney Burrus,et al.  Adaptive iterative reweighted least squares design of L/sub p/ FIR filters , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[26]  Brian Bullins,et al.  Fast minimization of structured convex quartics , 2018, 1812.10349.

[27]  Martin J. Wainwright,et al.  Asymptotic behavior of ℓp-based Laplacian regularization in semi-supervised learning , 2016, ArXiv.

[28]  Massimo Fornasier,et al.  Low-rank Matrix Recovery via Iteratively Reweighted Least Squares Minimization , 2010, SIAM J. Optim..

[29]  Ulrike von Luxburg,et al.  Phase transition in the family of p-resistances , 2011, NIPS.

[30]  Ahmed El Alaoui,et al.  Asymptotic behavior of \(\ell_p\)-based Laplacian regularization in semi-supervised learning , 2016, COLT.

[31]  I. Daubechies,et al.  Iteratively reweighted least squares minimization for sparse recovery , 2008, 0807.0575.

[32]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[33]  L. Dixon,et al.  Finite Algorithms in Optimization and Data Analysis. , 1988 .

[34]  L. Karlovitz,et al.  Construction of nearest points in the Lp, p even, and L∞ norms. I , 1970 .

[35]  Nathan Srebro,et al.  Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data , 2009, NIPS.

[36]  Daniel A. Spielman,et al.  Algorithms for Lipschitz Learning on Graphs , 2015, COLT.

[37]  Nisheeth K. Vishnoi,et al.  On a Natural Dynamics for Linear Programming , 2015, ITCS.

[38]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[39]  Nisheeth K. Vishnoi,et al.  IRLS and Slime Mold: Equivalence and Convergence , 2016, ArXiv.

[40]  Wotao Yin,et al.  Iteratively reweighted algorithms for compressive sensing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[42]  C. Sidney Burrus,et al.  Iterative reweighted least-squares design of FIR filters , 1994, IEEE Trans. Signal Process..

[43]  Abderrahim Elmoataz,et al.  Nonlocal $p$-Laplacian Variational problems on graphs. , 2018, 1810.12817.

[44]  Xavier Desquesnes,et al.  On the game p-Laplacian on weighted graphs with applications in image processing and data clustering† , 2017, European Journal of Applied Mathematics.

[45]  Jeff Calder,et al.  Consistency of Lipschitz learning with infinite unlabeled data and finite labeled data , 2017, SIAM J. Math. Data Sci..

[46]  C. Sidney Burrus,et al.  L/sub p/-complex approximation using iterative reweighted least squares for FIR digital filters , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[47]  Mauricio Flores Rios,et al.  Algorithms for 𝓁p-based semi-supervised learning on graphs , 2019, ArXiv.

[48]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[49]  O. Chapelle,et al.  Semi-Supervised Learning (Chapelle, O. et al., Eds.; 2006) [Book reviews] , 2009, IEEE Transactions on Neural Networks.

[50]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[51]  Nisheeth K. Vishnoi,et al.  Natural Algorithms for Flow Problems , 2016, SODA.

[52]  Kaihao Liang,et al.  Iteratively reweighted algorithm for signals recovery with coherent tight frame , 2018, Mathematical Methods in the Applied Sciences.