Statistical Estimation from Dependent Data

We consider a general statistical estimation problem wherein binary labels across different observations are not independent conditioned on their feature vectors, but dependent, capturing settings where e.g. these observations are collected on a spatial domain, a temporal domain, or a social network, which induce dependencies. We model these dependencies in the language of Markov Random Fields and, importantly, allow these dependencies to be substantial, i.e. do not assume that the Markov Random Field capturing these dependencies is in high temperature. As our main contribution we provide algorithms and statistically efficient estimation rates for this model, giving several instantiations of our bounds in logistic regression, sparse logistic regression, and neural network settings with dependent data. Our estimation guarantees follow from novel results for estimating the parameters (i.e. external fields and interaction strengths) of Ising models from a single sample. We evaluate our estimation approach on real networked data, showing that it outperforms standard regression approaches that ignore dependencies, across three text classification datasets: Cora, Citeseer and Pubmed.

[1]  Sourav Chatterjee,et al.  Estimation in spin glasses: A first step , 2006 .

[2]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[3]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[4]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[5]  Bin Yu RATES OF CONVERGENCE FOR EMPIRICAL PROCESSES OF STATIONARY MIXING SEQUENCES , 1994 .

[6]  Nathan Srebro,et al.  SPECTRALLY-NORMALIZED MARGIN BOUNDS FOR NEURAL NETWORKS , 2018 .

[7]  Siddhartha V. Jayanti,et al.  Learning from weakly dependent data under Dobrushin's condition , 2019, COLT.

[8]  E. Glaeser,et al.  Crime and Social Interactions , 1995 .

[9]  Guy Bresler,et al.  Optimal Single Sample Tests for Structured versus Unstructured Network Data , 2018, COLT.

[10]  Aryeh Kontorovich,et al.  Predictive PAC Learning and Process Decompositions , 2013, NIPS.

[11]  Cosma Rohilla Shalizi,et al.  Rademacher complexity of stationary sequences , 2011, 1106.0730.

[12]  Mehryar Mohri,et al.  Rademacher Complexity Bounds for Non-I.I.D. Processes , 2008, NIPS.

[13]  Vladimir Pestov,et al.  Predictive PAC Learnability: A Paradigm for Learning from Exchangeable Input Data , 2010, 2010 IEEE International Conference on Granular Computing.

[14]  Nicholas A. Christakis,et al.  Social contagion theory: examining dynamic social networks and human behavior , 2011, Statistics in medicine.

[15]  Qian Xu,et al.  Graph Random Neural Networks for Semi-Supervised Learning on Graphs , 2020, NeurIPS.

[16]  R. Adamczak,et al.  A note on concentration for polynomials in the Ising model , 2018, Electronic Journal of Probability.

[17]  Constantinos Daskalakis,et al.  Concentration of Multilinear Functions of the Ising Model with Applications to Network Data , 2017, NIPS.

[18]  C. Manski Identification of Endogenous Social Effects: The Reflection Problem , 1993 .

[19]  Patrizia Berti,et al.  Rate of convergence of predictive distributions for dependent data , 2009, 1001.2152.

[20]  S. Mukherjee,et al.  Inference in Ising Models , 2015, 1507.07055.

[21]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[22]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[23]  Lise Getoor,et al.  Stability and Generalization in Structured Prediction , 2016, J. Mach. Learn. Res..

[24]  Ben Taskar,et al.  Collective Stability in Structured Prediction: Generalization from One Example , 2013, ICML.

[25]  Kevin Chen-Chuan Chang,et al.  Geom-GCN: Geometric Graph Convolutional Networks , 2020, ICLR.

[26]  David Gamarnik,et al.  Extension of the PAC framework to finite and countable Markov chains , 1999, COLT '99.

[27]  S. Chatterjee Concentration Inequalities With Exchangeable Pairs , 2005 .

[28]  Bernard Fortin,et al.  Identification of Peer Effects through Social Networks , 2007, SSRN Electronic Journal.

[29]  Elchanan Mossel,et al.  Evolutionary trees and the Ising model on the Bethe lattice: a proof of Steel’s conjecture , 2005, ArXiv.

[30]  M. Mohri,et al.  Stability Bounds for Stationary φ-mixing and β-mixing Processes , 2010 .

[31]  Allan Sly,et al.  Counting in two-spin models on d-regular graphs , 2014 .

[32]  Constantinos Daskalakis,et al.  Regression from dependent observations , 2019, STOC.

[33]  E. Duflo,et al.  The Role of Information and Social Interactions in Retirement Plan Decisions: Evidence from a Randomized Experiment , 2002 .

[34]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[35]  Glenn Ellison Learning, Local Interaction, and Coordination , 1993 .

[36]  Mehryar Mohri,et al.  Learning Theory and Algorithms for Forecasting Non-stationary Time Series , 2015, NIPS.

[37]  YuBin,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2011 .

[38]  J. Trogdon,et al.  Journal of Health Economics Peer Effects in Adolescent Overweight , 2022 .

[39]  Sumit Mukherjee,et al.  Joint estimation of parameters in Ising model , 2018, The Annals of Statistics.

[40]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[41]  Paul Valiant,et al.  How bad is worst-case data if you know where it comes from? , 2019, ArXiv.

[42]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[43]  Stuart Geman,et al.  Markov Random Field Image Models and Their Applications to Computer Vision , 2010 .