Inertial Stochastic PALM and its Application for Learning Student-t Mixture Models

Inertial algorithms for minimizing nonsmooth and nonconvex functions as the inertial proximal alternating linearized minimization algorithm (iPALM) have demonstrated their superiority with respect to computation time over their non inertial variants. In many problems in imaging and machine learning, the objective functions have a special form involving huge data which encourage the application of stochastic algorithms. While the stochastic gradient descent algorithm is still used in the majority of applications, recently also stochastic algorithms for minimizing nonsmooth and nonconvex functions were proposed. In this paper, we derive an inertial variant of the SPRING algorithm, called iSPRING, and prove linear convergence of the algorithm under certain assumptions. Numerical experiments show that our new algorithm performs better than SPRING or its deterministic counterparts, although the improvement for the inertial stochastic approach is not as large as those for the inertial deterministic one. The second aim of the paper is to demonstrate that (inertial) PALM both in the deterministic and stochastic form can be used for learning the parameters of Student-$t$ mixture models. We prove that the objective function of such models fulfills all convergence assumptions of the algorithms and demonstrate their performance by numerical examples.

[1]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[2]  Léon Bottou,et al.  Large-Scale Machine Learning with Stochastic Gradient Descent , 2010, COMPSTAT.

[3]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[4]  Hédy Attouch,et al.  On the convergence of the proximal algorithm for nonsmooth functions involving analytic features , 2008, Math. Program..

[5]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[6]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[7]  Ting-Zhu Huang,et al.  Total variation with overlapping group sparsity for deblurring images under Cauchy noise , 2019, Appl. Math. Comput..

[8]  Mike Davies,et al.  SPRING: A fast stochastic proximal alternating method for non-smooth non-convex optimization. , 2020 .

[9]  Q. M. Jonathan Wu,et al.  Robust Student's-t Mixture Model With Spatial Constraints and Its Application in Medical Image Segmentation , 2012, IEEE Transactions on Medical Imaging.

[10]  Madeleine Udell,et al.  The Sound of APALM Clapping: Faster Nonsmooth Nonconvex Optimization with Stochastic Asynchronous PALM , 2016, NIPS.

[11]  Wotao Yin,et al.  Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization , 2014, SIAM J. Optim..

[12]  Peter Richtárik,et al.  Semi-Stochastic Gradient Descent Methods , 2013, Front. Appl. Math. Stat..

[13]  Francis Bach,et al.  SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives , 2014, NIPS.

[14]  Guan Gui,et al.  A Convex Constraint Variational Method for Restoring Blurred Images in the Presence of Alpha-Stable Noises , 2018, Sensors.

[15]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[16]  Gabriele Steidl,et al.  Multivariate Myriad Filters Based on Parameter Estimation of Student-t Distributions , 2018, SIAM J. Imaging Sci..

[17]  Gabriele Steidl,et al.  Correction to: Alternatives to the EM algorithm for ML estimation of location, scatter matrix, and degree of freedom of the Student t distribution , 2019, Numer. Algorithms.

[18]  Jie Liu,et al.  SARAH: A Novel Method for Machine Learning Problems Using Stochastic Recursive Gradient , 2017, ICML.

[19]  O. Cappé,et al.  On‐line expectation–maximization algorithm for latent data models , 2009 .

[20]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[21]  Gabriele Steidl,et al.  Alternatives of the EM Algorithm for Estimating the Parameters of the Student-t Distribution , 2019, ArXiv.

[22]  Antonin Chambolle,et al.  Stochastic Primal-Dual Hybrid Gradient Algorithm with Arbitrary Sampling and Imaging Applications , 2017, SIAM J. Optim..

[23]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[24]  Thomas Pock,et al.  Inertial Proximal Alternating Linearized Minimization (iPALM) for Nonconvex and Nonsmooth Problems , 2016, SIAM J. Imaging Sci..

[25]  Jeremy MG Taylor,et al.  Robust Statistical Modeling Using the t Distribution , 1989 .

[26]  Hédy Attouch,et al.  Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality , 2008, Math. Oper. Res..

[27]  S. Łojasiewicz Sur la géométrie semi- et sous- analytique , 1993 .

[28]  Xiao-Li Meng,et al.  The EM Algorithm—an Old Folk‐song Sung to a Fast New Tune , 1997 .

[29]  S. Setzer,et al.  On the rotational invariant L1-norm PCA , 2020 .

[30]  Aristidis Likas,et al.  The mixtures of Student's t-distributions as a robust framework for rigid registration , 2009, Image Vis. Comput..

[31]  Pradipta Maji,et al.  Spatially Constrained Student’s t-Distribution Based Mixture Model for Robust Image Segmentation , 2018, Journal of Mathematical Imaging and Vision.

[32]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[33]  Yee Whye Teh,et al.  Stochastic Expectation Maximization with Variance Reduction , 2018, NeurIPS.

[34]  Benjamin Schrauwen,et al.  The student-t mixture as a natural image patch prior with application to image compression , 2014, J. Mach. Learn. Res..

[35]  Joel A. Tropp,et al.  Robust Computation of Linear Models by Convex Relaxation , 2012, Foundations of Computational Mathematics.

[36]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[37]  Gilad Lerman,et al.  An Overview of Robust Subspace Recovery , 2018, Proceedings of the IEEE.

[38]  Nikolas P. Galatsanos,et al.  Robust Image Segmentation with Mixtures of Student's t-Distributions , 2007, 2007 IEEE International Conference on Image Processing.

[39]  David E. Tyler A Distribution-Free $M$-Estimator of Multivariate Scatter , 1987 .

[40]  Andreas Griewank,et al.  Evaluating derivatives - principles and techniques of algorithmic differentiation, Second Edition , 2000, Frontiers in applied mathematics.

[41]  Jian Zheng,et al.  Robust Non-Rigid Point Set Registration Using Student's-t Mixture Model , 2014, PloS one.