Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation

Author(s): Koanantakool, P; Ali, A; Azad, A; Buluc, A; Morozov, D; Oliker, L; Yelick, K; Oh, SY | Abstract: Copyright 2018 by the author(s). Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal gradient method uses a novel communication-avoiding linear algebra algorithm and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving parallel scalability on problems with up to ≈819 billion parameters (1.28 million dimensions); even on a single node, HP-CONCORD demonstrates scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to estimate the underlying dependency structure of the brain from fMRI data, and use the result to identify functional regions automatically. The results show good agreement with a clustering from the neuroscience literature.

[1]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[2]  A. .. Lawrance On Conditional and Partial Correlation , 1976 .

[3]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[4]  Ramesh C. Agarwal,et al.  A three-dimensional approach to parallel matrix multiplication , 1995, IBM J. Res. Dev..

[5]  Robert A. van de Geijn,et al.  SUMMA: scalable universal matrix multiplication algorithm , 1995, Concurr. Pract. Exp..

[6]  Michael I. Jordan Graphical Models , 2003 .

[7]  Olivier Ledoit,et al.  Honey, I Shrunk the Sample Covariance Matrix , 2003 .

[8]  Olivier Ledoit,et al.  Honey, I Shrunk the Sample Covariance Matrix , 2003 .

[9]  R. Shibata,et al.  PARTIAL CORRELATION AND CONDITIONAL CORRELATION AS MEASURES OF CONDITIONAL INDEPENDENCE , 2004 .

[10]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[11]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[12]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[13]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[14]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[15]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[16]  James Demmel,et al.  Avoiding communication in sparse matrix computations , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[17]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[18]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[19]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[20]  Ariful Azad,et al.  Identifying Rare Cell Populations in Comparative Flow Cytometry , 2010, WABI.

[21]  Margaret H. Wright,et al.  The opportunities and challenges of exascale computing , 2010 .

[22]  James Demmel,et al.  Communication-Optimal Parallel 2.5D Matrix Multiplication and LU Factorization Algorithms , 2011, Euro-Par.

[23]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[24]  Kaustubh Supekar,et al.  Estimation of functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty , 2012, NeuroImage.

[25]  Trevor J. Hastie,et al.  Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso , 2011, J. Mach. Learn. Res..

[26]  Bin Yu,et al.  Estimation Stability With Cross-Validation (ESCV) , 2013, 1303.3128.

[27]  James Demmel,et al.  Minimizing Communication in All-Pairs Shortest Paths , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[28]  Katherine A. Yelick,et al.  A Communication-Optimal N-Body Algorithm for Direct Interactions , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[29]  Pradeep Ravikumar,et al.  BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables , 2013, NIPS.

[30]  Mark W. Woolrich,et al.  Resting-state fMRI in the Human Connectome Project , 2013, NeuroImage.

[31]  K. Khare,et al.  A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees , 2013, 1307.5381.

[32]  Pradeep Ravikumar,et al.  Large Scale Distributed Sparse Precision Estimation , 2013, NIPS.

[33]  Prabhanjan Kambadur,et al.  A Parallel, Block Greedy Method for Sparse Inverse Covariance Estimation for Ultra-high Dimensions , 2013, AISTATS.

[34]  J. Zico Kolter,et al.  Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting , 2013, ICML.

[35]  Seung-Jean Kim,et al.  Condition‐number‐regularized covariance estimation , 2013, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[36]  James Demmel,et al.  Communication-Optimal Parallel Recursive Rectangular Matrix Multiplication , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[37]  Jack Dongarra,et al.  Applied Mathematics Research for Exascale Computing , 2014 .

[38]  Katherine A. Yelick,et al.  A Computation- and Communication-Optimal Parallel Direct 3-Body Algorithm , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[39]  Samuel Williams,et al.  s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[40]  James Demmel,et al.  Communication lower bounds and optimal algorithms for numerical linear algebra*† , 2014, Acta Numerica.

[41]  Kshitij Khare,et al.  Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection , 2014, NIPS.

[42]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[43]  Le Song,et al.  CA-SVM: Communication-Avoiding Support Vector Machines on Distributed Systems , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[44]  Zening Fu,et al.  L0-regularized time-varying sparse inverse covariance estimation for tracking dynamic fMRI brain networks , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[45]  Joseph D. Ramsey,et al.  Scaling up Greedy Causal Search for Continuous Variables , 2015 .

[46]  Clark Glymour,et al.  A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images , 2016, International Journal of Data Science and Analytics.

[47]  Samuel Williams,et al.  Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication , 2015, SIAM J. Sci. Comput..

[48]  Matthew B. Blaschko,et al.  Learning to Discover Graphical Model Structures , 2016 .

[49]  J. Zico Kolter,et al.  The Multiple Quantile Graphical Model , 2016, NIPS.

[50]  Leonid Oliker,et al.  Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[51]  James Demmel,et al.  Avoiding Communication in Proximal Methods for Convex Optimization Problems , 2017, ArXiv.

[52]  Kshitij Khare,et al.  Generalized Pseudolikelihood Methods for Inverse Covariance Estimation , 2016, AISTATS.

[53]  G. Hunanyan,et al.  Portfolio Selection , 2019, Finanzwirtschaft, Banken und Bankmanagement I Finance, Banks and Bank Management.

[54]  James Demmel,et al.  Avoiding communication in primal and dual block coordinate descent methods , 2016, SIAM J. Sci. Comput..