论文信息 - Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression: A Statistical Mechanics Analysis - 字舞流文

Ising Model Selection Using $\ell_{1}$-Regularized Linear Regression: A Statistical Mechanics Analysis

We theoretically analyze the typical learning performance of `1-regularized linear regression (`1-LinR) for Ising model selection using the replica method from statistical mechanics. For typical random regular graphs in the paramagnetic phase, an accurate estimate of the typical sample complexity of `1-LinR is obtained. Remarkably, despite the model misspecification, `1-LinR is model selection consistent with the same order of sample complexity as `1-regularized logistic regression (`1-LogR), i.e., M = O (logN), where N is the number of variables of the Ising model. Moreover, we provide an efficient method to accurately predict the nonasymptotic behavior of `1-LinR for moderate M,N , such as precision and recall. Simulations show a fairly good agreement between theoretical predictions and experimental results, even for graphs with many loops, which supports our findings. Although this paper mainly focuses on `1-LinR, our method is readily applicable for precisely characterizing the typical learning performances of a wide class of `1-regularized M -estimators including `1-LogR and interaction screening.

Xiangming Meng | Yoshiyuki Kabashima | Tomoyuki Obuchi

[1] Alexandros G. Dimakis,et al. Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models , 2018, NeurIPS.

[2] R. Zecchina,et al. Inverse statistical problems: from the inverse Ising problem to data science , 2017, 1702.01522.

[3] Sundeep Rangan,et al. Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing , 2009, IEEE Transactions on Information Theory.

[4] Michael Chertkov,et al. Optimal structure and parameter learning of Ising models , 2016, Science Advances.

[5] J. Berg,et al. Bethe–Peierls approximation and the inverse Ising problem , 2011, 1112.3501.

[6] Andrea Montanari,et al. Which graphical models are difficult to learn? , 2009, NIPS.

[7] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .

[8] J. Besag. Statistical Analysis of Non-Lattice Data , 1975 .

[9] Manfred Opper,et al. Learning of couplings for random asymmetric kinetic Ising models revisited: random correlation matrices and learning curves , 2015, 1508.05865.

[10] N. Meinshausen,et al. High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[11] Florent Krzakala,et al. Asymptotic errors for convex penalized linear regression beyond Gaussian matrices. , 2020, 2002.04372.

[12] Elchanan Mossel,et al. Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms , 2007, SIAM J. Comput..

[13] Xiangming Meng,et al. Structure Learning in Inverse Ising Problems Using 𝓁2-Regularized Linear Estimator , 2020, ArXiv.

[14] Kurt Johansson,et al. ON RANDOM MATRICES FROM THE COMPACT CLASSICAL GROUPS , 1997 .

[15] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[16] M. Mézard,et al. Spin Glass Theory And Beyond: An Introduction To The Replica Method And Its Applications , 1986 .

[17] Andrea Montanari,et al. The LASSO Risk for Gaussian Matrices , 2010, IEEE Transactions on Information Theory.

[18] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19] Martin Genzel,et al. High-Dimensional Estimation of Structured Signals From Non-Linear Observations With General Convex Loss Functions , 2016, IEEE Transactions on Information Theory.

[20] P. Diaconis,et al. On the eigenvalues of random matrices , 1994, Journal of Applied Probability.

[21] Christos Thrampoulidis,et al. LASSO with Non-linear Measurements is Equivalent to One With Linear Measurements , 2015, NIPS.

[22] D. Brillinger. A Generalized Linear Model With “Gaussian” Regressor Variables , 2012 .

[23] S. Kak. Information, physics, and computation , 1996 .

[24] W. K. Hastings,et al. Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[25] Peng Zhao,et al. On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[26] Pradeep Ravikumar,et al. On Learning Ising Models under Huber's Contamination Model , 2020, NeurIPS.

[27] Yue Zhang,et al. On the Consistency of Feature Selection With Lasso for Non-linear Targets , 2016, ICML.

[28] Yaniv Plan,et al. The Generalized Lasso With Non-Linear Observations , 2015, IEEE Transactions on Information Theory.

[29] J. Lafferty,et al. High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[30] J. Berg,et al. Statistical mechanics of the inverse Ising problem and the optimal objective function , 2016, 1611.04281.

[31] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[32] Martin J. Wainwright,et al. A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[33] Guy Bresler,et al. Efficiently Learning Ising Models on Arbitrary Graphs , 2014, STOC.

[34] Florent Krzakala,et al. Generalisation error in learning with random features and the hidden manifold model , 2020, ICML.

[35] Martin J. Wainwright,et al. Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions , 2009, IEEE Transactions on Information Theory.

[36] Surya Ganguli,et al. Statistical Mechanics of Optimal Convex Inference in High Dimensions , 2016 .

[37] 西森秀稔. Statistical physics of spin glasses and information processing : an introduction , 2001 .

[38] Manfred Opper,et al. A statistical physics approach to learning curves for the inverse Ising problem , 2017, 1705.05403.

[39] Martin J. Wainwright,et al. High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[40] Federico Ricci-Tersenghi,et al. Pseudolikelihood decimation algorithm improving the inference of the interaction network in a general class of Ising models. , 2013, Physical review letters.

[41] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[42] Robert Tibshirani,et al. Estimation of Sparse Binary Pairwise Markov Networks using Pseudo-likelihoods , 2009, J. Mach. Learn. Res..

[43] E. Aurell,et al. Inverse Ising inference using all the data. , 2011, Physical review letters.

[44] B. McKay. The expected eigenvalue distribution of a large regular graph , 1981 .

[45] Michael Chertkov,et al. Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models , 2016, NIPS.

[46] M. Opper,et al. Advanced mean field methods: theory and practice , 2001 .

[47] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[48] F. Ricci-Tersenghi. The Bethe approximation for solving the inverse Ising problem: a comparison with other inference methods , 2011, 1112.4814.