Sample-efficient L0-L2 constrained structure learning of sparse Ising models

We consider the problem of learning the underlying graph of a sparse Ising model with $p$ nodes from $n$ i.i.d. samples. The most recent and best performing approaches combine an empirical loss (the logistic regression loss or the interaction screening loss) with a regularizer (an L1 penalty or an L1 constraint). This results in a convex problem that can be solved separately for each node of the graph. In this work, we leverage the cardinality constraint L0 norm, which is known to properly induce sparsity, and further combine it with an L2 norm to better model the non-zero coefficients. We show that our proposed estimators achieve an improved sample complexity, both (a) theoretically -- by reaching new state-of-the-art upper bounds for recovery guarantees -- and (b) empirically -- by showing sharper phase transitions between poor and full recovery for graph topologies studied in the literature -- when compared to their L1-based counterparts.

[1]  R. Mazumder,et al.  Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives , 2020, J. Mach. Learn. Res..

[2]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[3]  Andrea Montanari,et al.  Which graphical models are difficult to learn? , 2009, NIPS.

[4]  Colin J. Thompson,et al.  Mathematical Statistical Mechanics , 1972 .

[5]  A. Dedieu Sparse (group) learning with Lipschitz loss functions: a unified analysis , 2019, ArXiv.

[6]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[7]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[10]  B. McCoy,et al.  The Importance of the Ising Model , 2012, 1203.1456.

[11]  Bart P. G. Van Parys,et al.  Sparse Classification and Phase Transitions: A Discrete Optimization Perspective , 2017 .

[12]  A. Tsybakov,et al.  Slope meets Lasso: Improved oracle bounds and optimality , 2016, The Annals of Statistics.

[13]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[14]  Alexandros G. Dimakis,et al.  Sparse Logistic Regression Learns All Discrete Pairwise Graphical Models , 2018, NeurIPS.

[15]  Michael Chertkov,et al.  Optimal structure and parameter learning of Ising models , 2016, Science Advances.

[16]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[17]  Hussein Hazimeh,et al.  Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms , 2018, Oper. Res..

[18]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[19]  P. Radchenko,et al.  Subset Selection with Shrinkage: Sparse Linear Modeling When the SNR Is Low , 2017, Oper. Res..

[20]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[21]  Michael Chertkov,et al.  Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models , 2016, NIPS.

[22]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[23]  Antoine Dedieu,et al.  Error bounds for sparse classifiers in high-dimensions , 2018, AISTATS.

[24]  Toshiyuki TANAKA Mean-field theory of Boltzmann machine learning , 1998 .

[25]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[26]  Max Welling,et al.  Learning in Markov Random Fields with Contrastive Free Energies , 2005, AISTATS.

[27]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[28]  Guy Bresler,et al.  Efficiently Learning Ising Models on Arbitrary Graphs , 2014, STOC.

[29]  L. Onsager Crystal statistics. I. A two-dimensional model with an order-disorder transition , 1944 .

[30]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[31]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[32]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[33]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[34]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[35]  P. Rigollet 18.S997: High Dimensional Statistics , 2015 .

[36]  Dustin G. Mixon,et al.  Certifying the Restricted Isometry Property is Hard , 2012, IEEE Transactions on Information Theory.