Graph-Based Sparse Learning: Models, Algorithms, and Applications

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been shown to be powerful for improving the performance of sparse learning models. A graph is a fundamental way to represent structural information of features. This dissertation focuses on graph-based sparse learning. The first part of this dissertation aims to integrate a graph into sparse learning to improve the performance. Specifically, the problem of feature grouping and selection over a given undirected graph is considered. Three models are proposed along with efficient solvers to achieve simultaneous feature grouping and selection, enhancing estimation accuracy. One major challenge is that it is still computationally challenging to solve large scale graph-based sparse learning problems. An efficient, scalable, and parallel algorithm for one widely used graph-based sparse learning approach, called anisotropic total variation regularization is therefore proposed, by explicitly exploring the structure of a graph. The second part of this dissertation focuses on uncovering the graph structure from the data. Two issues in graphical modeling are considered. One is the joint estimation of multiple graphical models using a fused lasso penalty and the other is the estimation of hierarchical graphical models. The key technical contribution is to establish the necessary and sufficient condition for the graphs to be decomposable. Based on this key property, a simple screening rule is presented, which reduces the size of the optimization problem, dramatically reducing the computational cost.

[1]  A. Rinaldo Properties and refinements of the fused lasso , 2008, 0805.0234.

[2]  M. Sion On general minimax theorems , 1958 .

[3]  Yong Zhang,et al.  An augmented Lagrangian approach for sparse principal component analysis , 2009, Mathematical Programming.

[4]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[5]  Jorge Nocedal,et al.  Newton-Like Methods for Sparse Inverse Covariance Estimation , 2012, NIPS.

[6]  Michael A. Saunders,et al.  Proximal Newton-type Methods for Minimizing Convex Objective Functions in Composite Form , 2012, NIPS 2012.

[7]  Stephen P. Boyd,et al.  An ADMM Algorithm for a Class of Total Variation Regularized Estimation Problems , 2012, 1203.1828.

[8]  O. SIAMJ.,et al.  SMOOTH OPTIMIZATION APPROACH FOR SPARSE COVARIANCE SELECTION∗ , 2009 .

[9]  Xiaoming Yuan,et al.  Alternating Direction Method for Covariance Selection Models , 2011, Journal of Scientific Computing.

[10]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[11]  Volkan Cevher,et al.  A proximal Newton framework for composite minimization: Graph learning without Cholesky decompositions and matrix inversions , 2013, ICML.

[12]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[13]  Katya Scheinberg,et al.  Practical inexact proximal quasi-Newton method with global complexity analysis , 2013, Mathematical Programming.

[14]  Le Song,et al.  Estimating time-varying networks , 2008, ISMB 2008.

[15]  Jieping Ye,et al.  Simultaneous feature and feature group selection through hard thresholding , 2014, KDD.

[16]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[17]  Jieping Ye,et al.  An efficient ADMM algorithm for multidimensional anisotropic total variation regularization problems , 2013, KDD.

[18]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[19]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[20]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[21]  Yin Zhang,et al.  User’s Guide for TVAL3: TV Minimization by Augmented Lagrangian and Alternating Direction Algorithms , 2010 .

[22]  ANTONIN CHAMBOLLE,et al.  An Algorithm for Total Variation Minimization and Applications , 2004, Journal of Mathematical Imaging and Vision.

[23]  Julien Mairal,et al.  Proximal Methods for Sparse Hierarchical Dictionary Learning , 2010, ICML.

[24]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[25]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[26]  Jing Li,et al.  Learning Brain Connectivity of Alzheimer's Disease from Neuroimaging Data , 2009, NIPS.

[27]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[28]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[29]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[30]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[31]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[32]  Trevor J. Hastie,et al.  Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso , 2011, J. Mach. Learn. Res..

[33]  P. Zhao,et al.  The composite absolute penalties family for grouped and hierarchical variable selection , 2009, 0909.0411.

[34]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[35]  Lu Li,et al.  An inexact interior point method for L1-regularized sparse covariance selection , 2010, Math. Program. Comput..

[36]  Jieping Ye,et al.  Efficient Sparse Group Feature Selection via Nonconvex Optimization , 2012, ICML.

[37]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[38]  Junzhou Huang,et al.  Efficient MR Image Reconstruction for Compressed MR Imaging , 2010, MICCAI.

[39]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[40]  Chia-Hua Ho,et al.  An improved GLMNET for l1-regularized logistic regression , 2011, J. Mach. Learn. Res..

[41]  Kim-Chuan Toh,et al.  Solving Log-Determinant Optimization Problems by a Newton-CG Primal Proximal Point Algorithm , 2010, SIAM J. Optim..

[42]  Mark W. Schmidt,et al.  Convergence Rates of Inexact Proximal-Gradient Methods for Convex Optimization , 2011, NIPS.

[43]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Tong Zhang Multi-stage Convex Relaxation for Feature Selection , 2011, 1106.0565.

[45]  Peter Wonka,et al.  Fused Multiple Graphical Lasso , 2012, SIAM J. Optim..

[46]  Leon Wenliang Zhong,et al.  Efficient Sparse Modeling With Automatic Feature Grouping , 2011, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Zhaosong Lu,et al.  Adaptive First-Order Methods for General Sparse Inverse Covariance Selection , 2009, SIAM J. Matrix Anal. Appl..

[48]  Katya Scheinberg,et al.  IBM Research Report SINCO - A Greedy Coordinate Ascent Method for Sparse Inverse Covariance Selection Problem , 2009 .

[49]  H. Bondell,et al.  Simultaneous Regression Shrinkage, Variable Selection, and Supervised Clustering of Predictors with OSCAR , 2008, Biometrics.

[50]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[51]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[52]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[53]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[54]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[55]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[56]  Shiqian Ma,et al.  An alternating direction method for total variation denoising , 2011, Optim. Methods Softw..

[57]  Hongliang Fei,et al.  Regularization and feature selection for networked features , 2010, CIKM '10.

[58]  Eric P. Xing,et al.  On Time Varying Undirected Graphs , 2011, AISTATS.

[59]  Xiaotong Shen,et al.  Adaptive Model Selection , 2002 .

[60]  Jieping Ye,et al.  An efficient algorithm for a class of fused lasso problems , 2010, KDD.

[61]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[62]  Jieping Ye,et al.  Moreau-Yosida Regularization for Grouped Tree Structure Learning , 2010, NIPS.

[63]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[64]  Shiqian Ma,et al.  An efficient algorithm for compressed MR imaging using total variation and wavelets , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[66]  Dimitris Samaras,et al.  Multi-Task Learning of Gaussian Graphical Models , 2010, ICML.

[67]  Junfeng Yang,et al.  An Efficient TVL1 Algorithm for Deblurring Multichannel Images Corrupted by Impulsive Noise , 2009, SIAM J. Sci. Comput..

[68]  Marc Teboulle,et al.  Fast Gradient-Based Algorithms for Constrained Total Variation Image Denoising and Deblurring Problems , 2009, IEEE Transactions on Image Processing.

[69]  Kuncheng Li,et al.  Altered functional connectivity in early Alzheimer's disease: A resting‐state fMRI study , 2007, Human brain mapping.

[70]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[71]  Xiaotong Shen,et al.  Grouping Pursuit Through a Regularization Solution Surface , 2010, Journal of the American Statistical Association.

[72]  Dimitri P. Bertsekas,et al.  On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators , 1992, Math. Program..

[73]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[74]  Paul M. Thompson,et al.  Multi-source learning with block-wise missing data for Alzheimer's disease prediction , 2013, KDD.

[75]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[76]  Pham Dinh Tao,et al.  Duality in D.C. (Difference of Convex functions) Optimization. Subgradient Methods , 1988 .

[77]  Shiqian Ma,et al.  Sparse Inverse Covariance Selection via Alternating Linearization Methods , 2010, NIPS.

[78]  Stephen J. Wright,et al.  Active Set Identification in Nonlinear Programming , 2006, SIAM J. Optim..

[79]  Suvrit Sra,et al.  Fast Newton-type Methods for Total Variation Regularization , 2011, ICML.

[80]  Laurent Condat,et al.  A Direct Algorithm for 1-D Total Variation Denoising , 2013, IEEE Signal Processing Letters.

[81]  Qingyang Li,et al.  A Highly Scalable Parallel Algorithm for Isotropic Total Variation Models , 2014, ICML.

[82]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[83]  Anulekha Dhara,et al.  Optimality Conditions in Convex Optimization: A Finite-Dimensional View , 2011 .

[84]  Bingsheng He,et al.  On the O(1/n) Convergence Rate of the Douglas-Rachford Alternating Direction Method , 2012, SIAM J. Numer. Anal..

[85]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[86]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[87]  C. Grady,et al.  Intercorrelations of regional cerebral glucose metabolic rates in Alzheimer's disease , 1987, Brain Research.

[88]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[89]  T. P. Dinh,et al.  Convex analysis approach to d.c. programming: Theory, Algorithm and Applications , 1997 .

[90]  Pradeep Ravikumar,et al.  A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation , 2012, NIPS.

[91]  Seungyeop Han,et al.  Structured Learning of Gaussian Graphical Models , 2012, NIPS.

[92]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[93]  Takashi Washio,et al.  Common Substructure Learning of Multiple Graphical Gaussian Models , 2011, ECML/PKDD.

[94]  Curtis R. Vogel,et al.  Ieee Transactions on Image Processing Fast, Robust Total Variation{based Reconstruction of Noisy, Blurred Images , 2022 .