Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression: A Statistical Mechanics Analysis

We theoretically analyze the typical learning performance of `1-regularized linear regression (`1-LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of `1-LinR is obtained. Remarkably, despite the model misspecification, `1-LinR is model selection consistent with the same order of sample complexity as `1-regularized logistic regression (`1-LogR), i.e., M = O (logN), where N is the number of variables of the Ising model. Moreover, we provide an efficient method to accurately predict the nonasymptotic behavior of `1-LinR for moderate M,N , such as precision and recall. Simulations show a fairly good agreement between theoretical predictions and experimental results, even for graphs with many loops, which supports our findings. Although this paper mainly focuses on `1-LinR, our method is readily applicable for precisely characterizing the typical learning performances of a wide class of `1-regularized M -estimators including `1-LogR and interaction screening.

[1]  Alexandros G. Dimakis,et al.  Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models , 2018, NeurIPS.

[2]  R. Zecchina,et al.  Inverse statistical problems: from the inverse Ising problem to data science , 2017, 1702.01522.

[3]  Sundeep Rangan,et al.  Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing , 2009, IEEE Transactions on Information Theory.

[4]  Michael Chertkov,et al.  Optimal structure and parameter learning of Ising models , 2016, Science Advances.

[5]  J. Berg,et al.  Bethe–Peierls approximation and the inverse Ising problem , 2011, 1112.3501.

[6]  Andrea Montanari,et al.  Which graphical models are difficult to learn? , 2009, NIPS.

[7]  Christian Van den Broeck,et al.  Statistical Mechanics of Learning , 2001 .

[8]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[9]  Manfred Opper,et al.  Learning of couplings for random asymmetric kinetic Ising models revisited: random correlation matrices and learning curves , 2015, 1508.05865.

[10]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[11]  Florent Krzakala,et al.  Asymptotic errors for convex penalized linear regression beyond Gaussian matrices. , 2020, 2002.04372.

[12]  Elchanan Mossel,et al.  Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[13]  Xiangming Meng,et al.  Structure Learning in Inverse Ising Problems Using 𝓁2-Regularized Linear Estimator , 2020, ArXiv.

[14]  Kurt Johansson,et al.  ON RANDOM MATRICES FROM THE COMPACT CLASSICAL GROUPS , 1997 .

[15]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[16]  M. Mézard,et al.  Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .

[17]  Andrea Montanari,et al.  The LASSO Risk for Gaussian Matrices , 2010, IEEE Transactions on Information Theory.

[18]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Martin Genzel,et al.  High-Dimensional Estimation of Structured Signals From Non-Linear Observations With General Convex Loss Functions , 2016, IEEE Transactions on Information Theory.

[20]  P. Diaconis,et al.  On the eigenvalues of random matrices , 1994, Journal of Applied Probability.

[21]  Christos Thrampoulidis,et al.  LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements , 2015, NIPS.

[22]  D. Brillinger A Generalized Linear Model With “Gaussian” Regressor Variables , 2012 .

[23]  S. Kak Information, physics, and computation , 1996 .

[24]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[25]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[26]  Pradeep Ravikumar,et al.  On Learning Ising Models under Huber's Contamination Model , 2020, NeurIPS.

[27]  Yue Zhang,et al.  On the Consistency of Feature Selection With Lasso for Non-linear Targets , 2016, ICML.

[28]  Yaniv Plan,et al.  The Generalized Lasso With Non-Linear Observations , 2015, IEEE Transactions on Information Theory.

[29]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[30]  J. Berg,et al.  Statistical mechanics of the inverse Ising problem and the optimal objective function , 2016, 1611.04281.

[31]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[32]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[33]  Guy Bresler,et al.  Efficiently Learning Ising Models on Arbitrary Graphs , 2014, STOC.

[34]  Florent Krzakala,et al.  Generalisation error in learning with random features and the hidden manifold model , 2020, ICML.

[35]  Martin J. Wainwright,et al.  Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions , 2009, IEEE Transactions on Information Theory.

[36]  Surya Ganguli,et al.  Statistical Mechanics of Optimal Convex Inference in High Dimensions , 2016 .

[37]  西森 秀稔 Statistical physics of spin glasses and information processing : an introduction , 2001 .

[38]  Manfred Opper,et al.  A statistical physics approach to learning curves for the inverse Ising problem , 2017, 1705.05403.

[39]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[40]  Federico Ricci-Tersenghi,et al.  Pseudolikelihood decimation algorithm improving the inference of the interaction network in a general class of Ising models. , 2013, Physical review letters.

[41]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[42]  Robert Tibshirani,et al.  Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods , 2009, J. Mach. Learn. Res..

[43]  E. Aurell,et al.  Inverse Ising inference using all the data. , 2011, Physical review letters.

[44]  B. McKay The expected eigenvalue distribution of a large regular graph , 1981 .

[45]  Michael Chertkov,et al.  Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models , 2016, NIPS.

[46]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[47]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[48]  F. Ricci-Tersenghi The Bethe approximation for solving the inverse Ising problem: a comparison with other inference methods , 2011, 1112.4814.