High-dimensional Ising model selection using ℓ1-regularized logistic regression

We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on $\ell_1$-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an $\ell_1$-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes $p$ and maximum neighborhood size $d$ are allowed to grow as a function of the number of observations $n$. Our main results provide sufficient conditions on the triple $(n,p,d)$ and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes $n=\Omega(d^3\log p)$ with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of $n=\Omega(d^2\log p)$ suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

[1]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[2]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[3]  J. Woods Markov image modeling , 1976, 1976 IEEE Conference on Decision and Control including the 15th Symposium on Adaptive Processes.

[4]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[5]  M. Hassner,et al.  The use of Markov Random Fields as models of texture , 1980 .

[6]  Anil K. Jain,et al.  Markov Random Field Texture Models , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[9]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[10]  D. Welsh Complexity: Knots, Colourings and Counting: Link polynomials and the Tait conjectures , 1993 .

[11]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[12]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[13]  Sanjoy Dasgupta,et al.  Learning Polytrees , 1999, UAI.

[14]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[15]  Nathan Srebro,et al.  Maximum likelihood bounded tree-width Markov networks , 2001, Artif. Intell..

[16]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[17]  D. Donoho,et al.  Maximal Sparsity Representation via l 1 Minimization , 2002 .

[18]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[19]  A. Ng Feature selection, L1 vs. L2 regularization, and rotational invariance , 2004, Twenty-first international conference on Machine learning - ICML '04.

[20]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[21]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[22]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[23]  Pieter Abbeel,et al.  Learning Factor Graphs in Polynomial Time and Sample Complexity , 2006, J. Mach. Learn. Res..

[24]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[25]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[26]  BoydStephen,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007 .

[27]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[28]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[29]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[30]  B. Schölkopf,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2007 .

[31]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[32]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[33]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[34]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[35]  Adam J. Rothman,et al.  Sparse permutation invariant covariance estimation , 2008, 0801.4837.

[36]  P. Bühlmann,et al.  The group lasso for logistic regression , 2008 .

[37]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[38]  Zhe Jiang,et al.  Spatial Statistics , 2013 .

[39]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .