Continuation of Nesterov’s Smoothing for Regression With Structured Sparsity in High-Dimensional Neuroimaging

Predictive models can be used on high-dimensional brain images to decode cognitive states or diagnosis/prognosis of a clinical condition/evolution. Spatial regularization through structured sparsity offers new perspectives in this context and reduces the risk of overfitting the model while providing interpretable neuroimaging signatures by forcing the solution to adhere to domain-specific constraints. Total variation (TV) is a promising candidate for structured penalization: it enforces spatial smoothness of the solution while segmenting predictive regions from the background. We consider the problem of minimizing the sum of a smooth convex loss, a non-smooth convex penalty (whose proximal operator is known) and a wide range of possible complex, non-smooth convex structured penalties such as TV or overlapping group Lasso. Existing solvers are either limited in the functions they can minimize or in their practical capacity to scale to high-dimensional imaging data. Nesterov’s smoothing technique can be used to minimize a large number of non-smooth convex structured penalties. However, reasonable precision requires a small smoothing parameter, which slows down the convergence speed to unacceptable levels. To benefit from the versatility of Nesterov’s smoothing technique, we propose a first order continuation algorithm, CONESTA, which automatically generates a sequence of decreasing smoothing parameters. The generated sequence maintains the optimal convergence speed toward any globally desired precision. Our main contributions are: gap to probe the current distance to the global optimum in order to adapt the smoothing parameter and the To propose an expression of the duality convergence speed. This expression is applicable to many penalties and can be used with other solvers than CONESTA. We also propose an expression for the particular smoothing parameter that minimizes the number of iterations required to reach a given precision. Furthermore, we provide a convergence proof and its rate, which is an improvement over classical proximal gradient smoothing methods. We demonstrate on both simulated and high-dimensional structural neuroimaging data that CONESTA significantly outperforms many state-of-the-art solvers in regard to convergence speed and precision.

[1]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[2]  Nick C Fox,et al.  The clinical use of structural MRI in Alzheimer disease , 2010, Nature Reviews Neurology.

[3]  J. Mairal Sparse coding for machine learning, image processing and computer vision , 2010 .

[4]  Julien Mairal,et al.  Convex optimization with sparsity-inducing norms , 2011 .

[5]  John Ashburner,et al.  A fast diffeomorphic image registration algorithm , 2007, NeuroImage.

[6]  Johanna S. Hardin,et al.  A method for generating realistic correlation matrices , 2011, 1106.5834.

[7]  Michael Eickenberg,et al.  Speeding-Up Model-Selection in Graphnet via Early-Stopping and Univariate Feature-Screening , 2015, 2015 International Workshop on Pattern Recognition in NeuroImaging.

[8]  Gaël Varoquaux,et al.  Total Variation Regularization for fMRI-Based Prediction of Behavior , 2011, IEEE Transactions on Medical Imaging.

[9]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[10]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[11]  Vincent Frouin,et al.  Structured Sparse Principal Components Analysis With the TV-Elastic Net Penalty , 2016, IEEE Transactions on Medical Imaging.

[12]  A. Chambolle,et al.  On the Convergence of the Iterates of the “Fast Iterative Shrinkage/Thresholding Algorithm” , 2015, J. Optim. Theory Appl..

[13]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[14]  Marie Chupin,et al.  Automatic classi fi cation of patients with Alzheimer ' s disease from structural MRI : A comparison of ten methods using the ADNI database , 2010 .

[15]  Michael Eickenberg,et al.  FAASTA: A fast solver for total-variation regularization of ill-conditioned problems with application to brain imaging , 2015, ArXiv.

[16]  John-Dylan Haynes,et al.  Multivariate decoding and brain reading: Introduction to the special issue , 2011, NeuroImage.

[17]  Jonathan E. Taylor,et al.  Interpretable whole-brain prediction analysis with GraphNet , 2013, NeuroImage.

[18]  Nick C Fox,et al.  The Alzheimer's disease neuroimaging initiative (ADNI): MRI methods , 2008, Journal of magnetic resonance imaging : JMRI.

[19]  Stephen P. Boyd,et al.  An ADMM Algorithm for a Class of Total Variation Regularized Estimation Problems , 2012, 1203.1828.

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Vincent Frouin,et al.  Predictive support recovery with TV-Elastic Net penalty and logistic regression: An application to structural MRI , 2014, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[22]  Gaël Varoquaux,et al.  Identifying Predictive Regions from fMRI with TV-L1 Prior , 2013, 2013 International Workshop on Pattern Recognition in Neuroimaging.

[23]  Gaël Varoquaux,et al.  Benchmarking solvers for TV-ℓ1 least-squares and logistic regression in brain imaging , 2014, 2014 International Workshop on Pattern Recognition in Neuroimaging.

[24]  Francesco Orabona,et al.  PRISMA: PRoximal Iterative SMoothing Algorithm , 2012, ArXiv.

[25]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[26]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[27]  Vince D. Calhoun,et al.  Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls , 2017, NeuroImage.

[28]  Marc Teboulle,et al.  Smoothing and First Order Methods: A Unified Framework , 2012, SIAM J. Optim..

[29]  Karl J. Friston,et al.  Unified segmentation , 2005, NeuroImage.

[30]  Xi Chen,et al.  Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[31]  Tommy Löfstedt,et al.  Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate , 2014, Journal of Statistical Software.

[32]  Xi Chen,et al.  An Efficient Optimization Algorithm for Structured Sparse CCA, with Applications to eQTL Mapping , 2011, Statistics in Biosciences.

[33]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[34]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[35]  Jean Charles Gilbert,et al.  Numerical Optimization: Theoretical and Practical Aspects , 2003 .