A Stochastic Majorize-Minimize Subspace Algorithm for Online Penalized Least Squares Estimation

Stochastic approximation techniques play an important role in solving many problems encountered in machine learning or adaptive signal processing. In these contexts, the statistics of the data are often unknown a priori or their direct computation is too intensive, and they have thus to be estimated online from the observed signals. For batch optimization of an objective function being the sum of a data fidelity term and a penalization (e.g., a sparsity promoting function), Majorize-Minimize (MM) methods have recently attracted much interest since they are fast, highly flexible, and effective in ensuring convergence. The goal of this paper is to show how these methods can be successfully extended to the case when the data fidelity term corresponds to a least squares criterion and the cost function is replaced by a sequence of stochastic approximations of it. In this context, we propose an online version of an MM subspace algorithm and we study its convergence by using suitable probabilistic tools. Simulation results illustrate the good practical performance of the proposed algorithm associated with a memory gradient subspace, when applied to both nonadaptive and adaptive filter identification problems.

[1]  S. V. N. Vishwanathan,et al.  A Quasi-Newton Approach to Nonsmooth Convex Optimization Problems in Machine Learning , 2008, J. Mach. Learn. Res..

[2]  Jacob Benesty,et al.  Sparse Adaptive Filters for Echo Cancellation , 2010, Synthesis Lectures on Speech and Audio Processing.

[3]  Volkan Cevher,et al.  Stochastic Spectral Descent for Discrete Graphical Models , 2016, IEEE Journal of Selected Topics in Signal Processing.

[4]  Sergios Theodoridis,et al.  Online Sparse System Identification and Signal Reconstruction Using Projections Onto Weighted $\ell_{1}$ Balls , 2010, IEEE Transactions on Signal Processing.

[5]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[6]  Mila Nikolova,et al.  Analysis of Half-Quadratic Minimization Methods for Signal and Image Recovery , 2005, SIAM J. Sci. Comput..

[7]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[8]  H. Zou,et al.  Addendum: Regularization and variable selection via the elastic net , 2005 .

[9]  T. Lai Martingales in Sequential Analysis and Time Series, 1945-1985 ∗ , 2009 .

[10]  A. Hero,et al.  Regularized Least-Mean-Square Algorithms , 2010, 1012.5066.

[11]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[12]  Émilie Chouzenoux,et al.  A Majorize-Minimize Memory Gradient method for complex-valued inverse problems , 2014, Signal Process..

[13]  Jie Liu,et al.  Mini-Batch Semi-Stochastic Gradient Descent in the Proximal Setting , 2015, IEEE Journal of Selected Topics in Signal Processing.

[14]  Akihiko Sugiyama,et al.  A generalized proportionate variable step-size algorithm for fast changing acoustic environments , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  H. Robbins,et al.  A CONVERGENCE THEOREM FOR NON NEGATIVE ALMOST SUPERMARTINGALES AND SOME APPLICATIONS**Research supported by NIH Grant 5-R01-GM-16895-03 and ONR Grant N00014-67-A-0108-0018. , 1971 .

[16]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[17]  Jérôme Idier,et al.  Convex half-quadratic criteria and interacting auxiliary variables for image restoration , 2001, IEEE Trans. Image Process..

[18]  A. Ostrowski Solution of equations in Euclidean and Banach spaces , 1973 .

[19]  Vahid Tarokh,et al.  SPARLS: The Sparse RLS Algorithm , 2010, IEEE Transactions on Signal Processing.

[20]  Hugues Talbot,et al.  A Majorize-Minimize Subspace Approach for ℓ2-ℓ0 Image Regularization , 2011, SIAM J. Imaging Sci..

[21]  É. Moulines,et al.  On stochastic proximal gradient algorithms , 2014 .

[22]  Alfred O. Hero,et al.  Sparse LMS for system identification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Sergios Theodoridis,et al.  Machine Learning: A Bayesian and Optimization Perspective , 2015 .

[24]  Boris Ginsburg,et al.  SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques , 2016, NIPS.

[25]  Léon Bottou,et al.  Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[26]  Steven L. Gay,et al.  The fast affine projection algorithm , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[27]  Émilie Chouzenoux,et al.  A Majorize–Minimize Strategy for Subspace Optimization Applied to Image Restoration , 2011, IEEE Transactions on Image Processing.

[28]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[29]  H. Robbins A Stochastic Approximation Method , 1951 .

[30]  Jorge Nocedal,et al.  A Stochastic Quasi-Newton Method for Large-Scale Optimization , 2014, SIAM J. Optim..

[31]  Zhihua Zhang,et al.  Surrogate maximization/minimization algorithms and extensions , 2007, Machine Learning.

[32]  W. Stout The Hartman-Wintner Law of the Iterated Logarithm for Martingales , 1970 .

[33]  S. Menozzi,et al.  Concentration bounds for stochastic approximations , 2012, 1204.3730.

[34]  Justin Domke,et al.  Finito: A faster, permutable incremental gradient method for big data problems , 2014, ICML.

[35]  M. Fathi,et al.  Transport-Entropy inequalities and deviation estimates for stochastic approximation schemes , 2013, 1301.7740.

[36]  J. Pesquet,et al.  Wavelet-Based Image Deconvolution and Reconstruction , 2016 .

[37]  José Antonio Apolinário,et al.  Set-Membership Proportionate Affine Projection Algorithms , 2007, EURASIP J. Audio Speech Music. Process..

[38]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[39]  Odile Macchi,et al.  Adaptive Processing: The Least Mean Squares Approach with Applications in Transmission , 1995 .

[40]  Paulo S. R. Diniz,et al.  Affine projection algorithms for sparse system identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Eric Moulines,et al.  Non-Asymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning , 2011, NIPS.

[42]  Sergios Theodoridis,et al.  Generalized Thresholding and Online Sparsity-Aware Learning in a Union of Subspaces , 2011, IEEE Transactions on Signal Processing.

[43]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[44]  Isao Yamada,et al.  A sparse adaptive filtering using time-varying soft-thresholding techniques , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[45]  Lorenzo Rosasco,et al.  On regularization algorithms in learning theory , 2007, J. Complex..

[46]  Shunsuke Ono,et al.  A sparse system identification by using adaptively-weighted total variation via a primal-dual splitting approach , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[47]  Émilie Chouzenoux,et al.  Convergence Rate Analysis of the Majorize–Minimize Subspace Algorithm , 2016, IEEE Signal Processing Letters.

[48]  Lorenzo Rosasco,et al.  GURLS: a least squares library for supervised learning , 2013, J. Mach. Learn. Res..

[49]  J. Idier,et al.  On global and local convergence of half-quadratic algorithms , 2002, Proceedings. International Conference on Image Processing.

[50]  Julien Mairal,et al.  Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization , 2013, NIPS.

[51]  Georgios B. Giannakis,et al.  Online Adaptive Estimation of Sparse Signals: Where RLS Meets the $\ell_1$ -Norm , 2010, IEEE Transactions on Signal Processing.

[52]  Michel Barlaud,et al.  Deterministic edge-preserving regularization in computed imaging , 1997, IEEE Trans. Image Process..

[53]  Patrick Gallinari,et al.  SGD-QN: Careful Quasi-Newton Stochastic Gradient Descent , 2009, J. Mach. Learn. Res..

[54]  Michael Elad,et al.  L1-L2 Optimization in Signal and Image Processing , 2010, IEEE Signal Processing Magazine.

[55]  Nicolai Bissantz,et al.  Convergence Analysis of Generalized Iteratively Reweighted Least Squares Algorithms on Convex Function Spaces , 2008, SIAM J. Optim..

[56]  Alfred O. Hero,et al.  A Survey of Stochastic Simulation and Optimization Methods in Signal Processing , 2015, IEEE Journal of Selected Topics in Signal Processing.

[57]  J. Davidson Stochastic Limit Theory , 1994 .

[58]  H. Robbins,et al.  A Convergence Theorem for Non Negative Almost Supermartingales and Some Applications , 1985 .

[59]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[60]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[61]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[62]  L. Rosasco,et al.  Convergence of Stochastic Proximal Gradient Algorithm , 2014, Applied Mathematics & Optimization.

[63]  A. Miele,et al.  Study on a memory gradient method for the minimization of functions , 1969 .

[64]  P. L. Combettes,et al.  Stochastic Approximations and Perturbations in Forward-Backward Splitting for Monotone Operators , 2015, 1507.07095.

[65]  L. Qi,et al.  A Stochastic Newton Method for Stochastic Quadratic Programs with Recourse , 1995 .

[66]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[67]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[68]  Paulo S. R. Diniz,et al.  Adaptive Filtering: Algorithms and Practical Implementation , 1997 .

[69]  O. Guseva Convergence rate of the method of generalized stochastic gradients , 1971 .

[70]  J. Pesquet,et al.  A stochastic 3MG algorithm with application to 2D filter identification , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[71]  Robert M. Gower,et al.  Randomized Quasi-Newton Updates Are Linearly Convergent Matrix Inversion Algorithms , 2016, SIAM J. Matrix Anal. Appl..

[72]  Konstantinos Koutroumbas,et al.  A Variational Bayes Framework for Sparse Adaptive Estimation , 2014, IEEE Transactions on Signal Processing.

[73]  Hugues Talbot,et al.  A majorize-minimize memory gradient algorithm applied to X-ray tomography , 2013, 2013 IEEE International Conference on Image Processing.

[74]  José M. Bioucas-Dias,et al.  An Augmented Lagrangian Approach to the Constrained Optimization Formulation of Imaging Inverse Problems , 2009, IEEE Transactions on Image Processing.

[75]  Mads Græsbøll Christensen,et al.  Synthesis Lectures on Speech and Audio Processing , 2010 .

[76]  Dimitris G. Manolakis,et al.  Statistical and Adaptive Signal Processing , 2000 .

[77]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[78]  Andy W. H. Khong,et al.  Efficient Use Of Sparse Adaptive Filters , 2006, 2006 Fortieth Asilomar Conference on Signals, Systems and Computers.

[79]  V. Nascimento,et al.  Sparsity-aware affine projection adaptive algorithms for system identification , 2011 .

[80]  John N. Tsitsiklis,et al.  Gradient Convergence in Gradient methods with Errors , 1999, SIAM J. Optim..

[81]  S. Thomas Alexander,et al.  Adaptive Signal Processing , 1986, Texts and Monographs in Computer Science.

[82]  Yves Goussard,et al.  On global and local convergence of half-quadratic algorithms , 2006, IEEE Trans. Image Process..

[83]  H. Kushner,et al.  Stochastic Approximation and Recursive Algorithms and Applications , 2003 .