High Dimensional Logistic Regression Under Network Dependence

Abstract. Logistic regression is one of the most fundamental methods for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure, such as over a temporal/spatial domain or on a social network. This necessitates the development of models that can simultaneously handle both the network ‘peer-effect’ (arising from neighborhood interactions) and the effect of (possibly) high-dimensional covariates. In this paper, we develop a framework for incorporating such dependencies in a high-dimensional logistic regression model by introducing a quadratic interaction term, as in the Ising model, designed to capture the pairwise interactions from the underlying network. The resulting model can also be viewed as an Ising model, where the node-dependent external fields linearly encode the high-dimensional covariates. We propose a penalized maximum pseudo-likelihood method for estimating the network peer-effect and the effect of the covariates (the regression coefficients), which, in addition to handling the high-dimensionality of the parameters, conveniently avoids the computational intractability of the maximum likelihood approach. Consequently, our method is computational efficient and, under various standard regularity conditions, we show that the corresponding estimate attains the classical high-dimensional rate of consistency. In particular, our results imply that even under network dependence it is possible to consistently estimate the model parameters at the same rate as in classical (independent) logistic regression, when the true parameter is sparse and the underlying network is not too dense. As a consequence of the general results, we derive the rates of consistency of our proposed estimator for various natural graph ensembles, such as bounded degree graphs, sparse Erdős-Rényi random graphs, and stochastic block models.

[1]  Guy Bresler,et al.  Learning a Tree-Structured Ising Model in Order to Make Predictions , 2016, The Annals of Statistics.

[2]  Salil S. Bhate,et al.  Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging , 2017, Cell.

[3]  Guy Bresler,et al.  Efficiently Learning Ising Models on Arbitrary Graphs , 2014, STOC.

[4]  Martin J. Wainwright,et al.  Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions , 2009, IEEE Transactions on Information Theory.

[5]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[6]  Andrea Montanari,et al.  The spread of innovations in social networks , 2010, Proceedings of the National Academy of Sciences.

[7]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[8]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[9]  Michael Chertkov,et al.  Learning Planar Ising Models , 2010, J. Mach. Learn. Res..

[10]  Stuart Geman,et al.  Markov Random Field Image Models and Their Applications to Computer Vision , 2010 .

[11]  R. Adamczak,et al.  A note on concentration for polynomials in the Ising model , 2018, Electronic Journal of Probability.

[12]  Stephen E. Fienberg,et al.  Testing Statistical Hypotheses , 2005 .

[13]  Raghu Meka,et al.  Learning Graphical Models Using Multiplicative Weights , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[14]  Han Liu,et al.  Property testing in high-dimensional Ising models , 2017, The Annals of Statistics.

[15]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[16]  YuBin,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2011 .

[17]  C. Priebe,et al.  A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs , 2017 .

[18]  J. Trogdon,et al.  Journal of Health Economics Peer Effects in Adolescent Overweight , 2022 .

[19]  Sourav Chatterjee,et al.  Estimation in spin glasses: A first step , 2006 .

[20]  Vincent Y. F. Tan,et al.  High-dimensional structure estimation in Ising models: Local separation criterion , 2011, 1107.1736.

[21]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[22]  Allan Sly,et al.  Random graphs with a given degree sequence , 2010, 1005.1136.

[23]  Ankur Moitra,et al.  Information Theoretic Properties of Markov Random Fields, and their Algorithmic Applications , 2017, NIPS.

[24]  E. Candès,et al.  A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.

[25]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[26]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[27]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[28]  H. Künsch,et al.  Asymptotic Comparison of Estimators in the Ising Model , 1992 .

[29]  Guo-Cheng Yuan,et al.  Transcriptome-scale super-resolved imaging in tissues by RNA seqFISH+ , 2019, Nature.

[30]  F. Bunea Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization , 2008, 0808.4051.

[31]  Nicholas A. Christakis,et al.  Social contagion theory: examining dynamic social networks and human behavior , 2011, Statistics in medicine.

[32]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[33]  S. Mukherjee,et al.  Global testing against sparse alternatives under Ising models , 2016, The Annals of Statistics.

[34]  Sumit Mukherjee,et al.  Joint estimation of parameters in Ising model , 2018, The Annals of Statistics.

[35]  Han Liu,et al.  High-temperature structure detection in ferromagnets , 2018, Information and Inference: A Journal of the IMA.

[36]  Bo-Ying Wang,et al.  SOME INEQUALITIES FOR SINGULAR VALUES OF MATRIX PRODUCTS , 1997 .

[37]  H. Zou,et al.  Nonconcave penalized composite conditional likelihood estimation of sparse Ising models , 2012, 1208.3555.

[38]  P. Green,et al.  Hidden Markov Models and Disease Mapping , 2002 .

[39]  D. K. Pickard Inference for Discrete Markov Fields: The Simplest Nontrivial Case , 1987 .

[40]  E. Candès,et al.  The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression , 2018, The Annals of Statistics.

[41]  Ambuj Tewari,et al.  Learning Exponential Families in High-Dimensions: Strong Convexity and Sparsity , 2009, AISTATS.

[42]  F. Comets On Consistency of a Class of Estimators for Exponential Families of Markov Random Fields on the Lattice , 1992 .

[43]  Jing Lei A goodness-of-fit test for stochastic block models , 2014, 1412.4857.

[44]  H. P. Annales de l'Institut Henri Poincaré , 1931, Nature.

[45]  Basilis Gidas,et al.  Asymptotics of maximum likelihood estimators for the Curie-Weiss model , 1991 .

[46]  E. Glaeser,et al.  Crime and Social Interactions , 1995 .

[47]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[48]  Guy Bresler,et al.  Optimal Single Sample Tests for Structured versus Unstructured Network Data , 2018, COLT.

[49]  Constantinos Daskalakis,et al.  Regression from dependent observations , 2019, STOC.

[50]  Michael Chertkov,et al.  Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models , 2016, NIPS.

[51]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[52]  E. Duflo,et al.  The Role of Information and Social Interactions in Retirement Plan Decisions: Evidence from a Randomized Experiment , 2002 .

[53]  Constantinos Daskalakis,et al.  Learning Ising models from one or multiple samples , 2021, STOC.

[54]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[55]  Siddhartha V. Jayanti,et al.  Learning from weakly dependent data under Dobrushin's condition , 2019, COLT.

[56]  S. Geer,et al.  On the conditions used to prove oracle results for the Lasso , 2009, 0910.0722.

[57]  Yuxin Chen,et al.  The likelihood ratio test in high-dimensional logistic regression is asymptotically a rescaled Chi-square , 2017, Probability Theory and Related Fields.

[58]  Constantinos Daskalakis,et al.  Testing Ising Models , 2016, IEEE Transactions on Information Theory.

[59]  Constantinos Daskalakis,et al.  Logistic-Regression with peer-group effects via inference in higher order Ising models , 2020, AISTATS.

[60]  B. Gidas Consistency of Maximum Likelihood and Pseudo-Likelihood Estimators for Gibbs Distributions , 1988 .

[61]  Jeffrey M. Perkel,et al.  Starfish enterprise: finding RNA patterns in single cells , 2019, Nature.

[62]  S. Geer HIGH-DIMENSIONAL GENERALIZED LINEAR MODELS AND THE LASSO , 2008, 0804.0703.

[63]  Po-Ling Loh,et al.  Statistical consistency and asymptotic normality for high-dimensional robust M-estimators , 2015, ArXiv.

[64]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[65]  Benny Sudakov,et al.  The Largest Eigenvalue of Sparse Random Graphs , 2001, Combinatorics, Probability and Computing.

[66]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[67]  Shuheng Zhou,et al.  25th Annual Conference on Learning Theory Reconstruction from Anisotropic Random Measurements , 2022 .