Positive Semidefinite Rank-Based Correlation Matrix Estimation With Application to Semiparametric Graph Estimation

Many statistical methods gain robustness and flexibility by sacrificing convenient computational structures. In this article, we illustrate this fundamental tradeoff by studying a semiparametric graph estimation problem in high dimensions. We explain how novel computational techniques help to solve this type of problem. In particular, we propose a nonparanormal neighborhood pursuit algorithm to estimate high-dimensional semiparametric graphical models with theoretical guarantees. Moreover, we provide an alternative view to analyze the tradeoff between computational efficiency and statistical error under a smoothing optimization framework. Though this article focuses on the problem of graph estimation, the proposed methodology is widely applicable to other problems with similar structures. We also report thorough experimental results on text, stock, and genomic datasets.

[1]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[2]  Martin J. Wainwright,et al.  Fast global convergence of gradient methods for high-dimensional statistical recovery , 2011, ArXiv.

[3]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[4]  Geert Molenberghs,et al.  Transformation of non positive semidefinite correlation matrices , 1993 .

[5]  Ali Jalali,et al.  High-dimensional Sparse Inverse Covariance Estimation using Greedy Methods , 2011, AISTATS.

[6]  H. Zou,et al.  A direct approach to sparse discriminant analysis in ultra-high dimensions , 2012 .

[7]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[8]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[9]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[10]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[11]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[12]  Michael I. Jordan Graphical Models , 2003 .

[13]  H. Tsukahara,et al.  Semiparametric estimation in copula models , 2005 .

[14]  Panos M. Pardalos,et al.  An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds , 1990, Math. Program..

[15]  P. Brucker Review of recent development: An O( n) algorithm for quadratic knapsack problems , 1984 .

[16]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[17]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[18]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[19]  Tuo Zhao,et al.  CODA: high dimensional copula discriminant analysis , 2013, J. Mach. Learn. Res..

[20]  Tuo Zhao,et al.  Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation , 2012, NIPS.

[21]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[22]  Hongzhe Li,et al.  Robust Gaussian Graphical Modeling Via l1 Penalization , 2012, Biometrics.

[23]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Maximum Likelihood Estimation , 2007, ArXiv.

[24]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[25]  Hongzhe Li,et al.  Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. , 2006, Biostatistics.

[26]  P. Bühlmann,et al.  Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana , 2004, Genome Biology.

[27]  Xi Chen,et al.  Smoothing proximal gradient method for general structured sparse regression , 2010, The Annals of Applied Statistics.

[28]  Cun-Hui Zhang,et al.  Sparse matrix inversion with scaled Lasso , 2012, J. Mach. Learn. Res..

[29]  Ming Yuan,et al.  High Dimensional Inverse Covariance Matrix Estimation via Linear Programming , 2010, J. Mach. Learn. Res..

[30]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[31]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[32]  Hongzhe Li,et al.  A SPARSE CONDITIONAL GAUSSIAN GRAPHICAL MODEL FOR ANALYSIS OF GENETICAL GENOMICS DATA. , 2011, The annals of applied statistics.

[33]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[34]  C. Klaassen,et al.  Efficient estimation in the bivariate normal copula model: normal margins are least favourable , 1997 .

[35]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[36]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[37]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[38]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[39]  Nikos Paragios,et al.  Sparse and Locally Constant Gaussian Graphical Models , 2009, NIPS.

[40]  T. Cai,et al.  A Constrained ℓ1 Minimization Approach to Sparse Precision Matrix Estimation , 2011, 1102.2233.

[41]  Jianqing Fan,et al.  Sparsistency and Rates of Convergence in Large Covariance Matrix Estimation. , 2007, Annals of statistics.

[42]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[43]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[44]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[45]  Tuo Zhao,et al.  Sparse Additive Machine , 2012, AISTATS.

[46]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[47]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[48]  Xin-She Yang,et al.  Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.