Scalable graphical models for social networks

This thesis tackles the problems of efficiently learning large probabilistic models for sparse relational data. Recent dramatic increases in the collection of social network data and the rapid growth in probabilistic and statistical approaches to tractable machine learning made it possible to analyze networks with millions of people. There are many questions one could ask about the formation, properties and dynamics in social networks. This thesis considers the following three questions: (1) given a set of interactions between people, what can be learned about the relations of these people without knowing the true underlying social network; (2) given additional information about each individual in the network, what can be done to improve understanding of their relations; (3) what are the dynamics underlying the formation and the evolution of social networks. We introduce new algorithms and models for learning about relations in a social network and evolution of those relations over time. We present a scalable search procedure for learning Bayesian Networks from the binary events data, i.e. this structure learning algorithm is based solely on the information about people's participation in the set of given events. We present learning results on very large (up to three million nodes) Bayesian Networks and show how they can be used to understand more about the underlying social networks. We extend this model by incorporating information about individuals, such as their affiliation and interests. We use block modeling to both improve the quality of our Bayesian Networks and learn more about group interaction patterns. Finally, we introduce a generative mechanism that provides an explanation of the social network evolution. This dynamic generative model is of exploratory nature. The described models and learning algorithms have one thing in common: they are all motivated by real life phenomena.

[1]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[2]  H. Echols Scientific Community , 1972, Nature.

[3]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[4]  P. Holland,et al.  Local Structure in Social Networks , 1976 .

[5]  W. Zachary,et al.  An Information Flow Model for Conflict and Fission in Small Groups , 1977, Journal of Anthropological Research.

[6]  M. Kochen,et al.  Contacts and influence , 1978 .

[7]  N. Milburn To Dwell Among Friends: Personal Networks in Town and City. , 1983 .

[8]  Kathryn B. Laskey,et al.  Stochastic blockmodels: First steps , 1983 .

[9]  M. M. Meyer,et al.  Statistical Analysis of Multiple Sociometric Relations. , 1985 .

[10]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[11]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[12]  C. Cooper,et al.  On the move : the psychology of change and transition , 1990 .

[13]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[14]  Gregory F. Cooper,et al.  A Bayesian Method for Constructing Bayesian Belief Networks from Databases , 1991, UAI.

[15]  S. Wasserman,et al.  Building stochastic blockmodels , 1992 .

[16]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Tom A. B. Snijders,et al.  Methods for longitudinal social network data: Review and Markov process models , 1995 .

[19]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[20]  T. Snijders Multivariate Statistics and Matrices in Statistics , 1995 .

[21]  T. Snijders Stochastic actor-oriented models for network change , 1996 .

[22]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[23]  Kathryn Fraughnaugh,et al.  Introduction to graph theory , 1973, Mathematical Gazette.

[24]  C. Meek,et al.  Graphical models: selecting causal and statistical models , 1997 .

[25]  Andrew W. Moore,et al.  Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets , 1998, J. Artif. Intell. Res..

[26]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[27]  Michael I. Jordan Graphical Models , 1998 .

[28]  Marina Meila,et al.  An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data , 1999, ICML.

[29]  S. Wasserman,et al.  Logit models and logistic regressions for social networks: III. Valued relations , 1999 .

[30]  Lise Getoor,et al.  Learning Probabilistic Relational Models , 1999, IJCAI.

[31]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[32]  David Maxwell Chickering,et al.  Fast Learning from Sparse Data , 1999, UAI.

[33]  Tom A. B. Snijders,et al.  Friendship Networks Through Time: An Actor-Oriented Dynamic Statistical Network Model , 1999, Comput. Math. Organ. Theory.

[34]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[35]  S. N. Dorogovtsev,et al.  Evolution of networks with aging of sites , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[36]  S. N. Dorogovtsev,et al.  Scaling Behaviour of Developing and Decaying Networks , 2000, cond-mat/0005050.

[37]  S. N. Dorogovtsev,et al.  Evolution of reference networks with aging , 2000, cond-mat/0001419.

[38]  J. M. Kleinberg Navigation in a small world : It is easier to find short chains between points in some networks than others. , 2000 .

[39]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[40]  David Maxwell Chickering,et al.  A comparison of scientific and engineering criteria for Bayesian model selection , 2000, Stat. Comput..

[41]  Jon M. Kleinberg,et al.  The small-world phenomenon: an algorithmic perspective , 2000, STOC '00.

[42]  B Skyrms,et al.  A dynamic model of social network formation. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[43]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[44]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[45]  M Girvan,et al.  Structure of growing social networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[46]  Jon M. Kleinberg,et al.  Spatial gossip and resource location protocols , 2001, JACM.

[47]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Valdis E. Krebs,et al.  Mapping Networks of Terrorist Cells , 2001 .

[49]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[50]  M E J Newman,et al.  Identity and Search in Social Networks , 2002, Science.

[51]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[52]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[53]  Stefan Bornholdt,et al.  Emergence of a small world from local interactions: modeling acquaintance networks. , 2002, Physical review letters.

[54]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[55]  Peter D. Hoff,et al.  Latent Space Approaches to Social Network Analysis , 2002 .

[56]  Yiming Yang,et al.  Stochastic link and group detection , 2002, AAAI/IAAI.

[57]  David Maxwell Chickering,et al.  Finding Optimal Bayesian Networks , 2002, UAI.

[58]  A. Barab,et al.  Evolution of the social network of scienti $ c collaborations , 2002 .

[59]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[60]  Peter D. Hoff Random Effects Models for Network Data , 2003 .

[61]  Heikki Mannila,et al.  Mixture Models and Frequent Sets: Combining Global and Local Methods for 0-1 Data , 2003, SDM.

[62]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[63]  Carter T. Butts,et al.  Network inference, error, and informant (in)accuracy: a Bayesian approach , 2003, Soc. Networks.

[64]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[65]  Andrew W. Moore,et al.  Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning , 2003, ICML.

[66]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[67]  David D. Jensen,et al.  Exploiting relational structure to understand publication patterns in high-energy physics , 2003, SKDD.

[68]  Jennifer Neville,et al.  Collective Classification with Relational Dependency Networks , 2003 .

[69]  Mark Huisman,et al.  Statistical Analysis of Longitudinal Network Data With Changing Composition , 2003 .

[70]  M. Handcock Center for Studies in Demography and Ecology Assessing Degeneracy in Statistical Models of Social Networks , 2005 .

[71]  Jennifer Neville,et al.  Learning relational probability trees , 2003, KDD '03.

[72]  Heikki Mannila,et al.  Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data , 2003, IEEE Trans. Knowl. Data Eng..

[73]  Paul S. Fischbeck,et al.  Estimating the Shape of Covert Networks , 2003 .

[74]  Kathleen M. Carley,et al.  BOUNCING BACK : RECOVERY MECHANISMS OF COVERT NETWORKS , 2003 .

[75]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[76]  Lise Getoor,et al.  SRL2003 IJCAI 2003 Workshop on Learning Statistical Models from Relational Data , 2003 .

[77]  Thomas L. Griffiths,et al.  Discovering Latent Classes in Relational Data , 2004 .

[78]  Roded Sharan,et al.  Bayesian haplo-type inference via the dirichlet process , 2004, ICML.

[79]  Lise Getoor,et al.  Deduplication and Group Detection using Links , 2004 .

[80]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[81]  Vladimir Batagelj,et al.  Generalized Blockmodeling (Structural Analysis in the Social Sciences) , 2004 .

[82]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[83]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[84]  Vladimir Batagelj,et al.  Generalized blockmodeling of two-mode network data , 2004, Soc. Networks.

[85]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[86]  Michael Lässig,et al.  Local graph alignment and motif search in biological networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[87]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[88]  Anna Goldenberg,et al.  Tractable learning of large Bayes net structures from sparse data , 2004, ICML.

[89]  D. Watts The “New” Science of Networks , 2004 .

[90]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[91]  T. Snijders Models for longitudinal network datain , 2005 .

[92]  A. Moore,et al.  Dynamic social network analysis using latent space models , 2005, SKDD.

[93]  Edoardo M. Airoldi,et al.  A latent mixed membership model for relational data , 2005, LinkKDD '05.

[94]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[95]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[96]  Andrew McCallum,et al.  Group and Topic Discovery from Relations and Their Attributes , 2005, NIPS.

[97]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[98]  P. Pattison,et al.  New Specifications for Exponential Random Graph Models , 2006 .

[99]  William W. Cohen,et al.  An Email and Meeting Assistant Using Graph Walks , 2006, CEAS.

[100]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[101]  William W. Cohen,et al.  Contextual search and name disambiguation in email using graphs , 2006, SIGIR.

[102]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[103]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[104]  Adam Tauman Kalai,et al.  Graph model selection using maximum likelihood , 2006, ICML.

[105]  Thomas L. Griffiths,et al.  Learning Systems of Concepts with an Infinite Relational Model , 2006, AAAI.

[106]  J. Pitman Combinatorial Stochastic Processes , 2006 .

[107]  Vikash K. Mansinghka,et al.  Structured Priors for Structure Learning , 2006, UAI.

[108]  Edoardo M. Airoldi,et al.  Combining Stochastic Block Models and Mixed Membership for Statistical Network Analysis , 2006, SNA@ICML.

[109]  Anna Goldenberg,et al.  Exploratory Study of a New Model for Evolving Networks , 2006, SNA@ICML.

[110]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[111]  Edoardo M. Airoldi,et al.  Bayesian mixed-membership models of complex and evolving networks , 2006 .

[112]  Garry Robins,et al.  Statistical Models for Networks: A Brief Review of Some Recent Research , 2006, SNA@ICML.

[113]  T. Snijders,et al.  Modeling the Coevolution of Networks and Behavior , 2007 .

[114]  Purnamrita Sarkar,et al.  A Latent Space Approach to Dynamic Embedding of Co-occurrence Data , 2007, AISTATS.

[115]  Roded Sharan,et al.  Bayesian Haplotype Inference via the Dirichlet Process , 2007, J. Comput. Biol..

[116]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[117]  Jon M. Kleinberg,et al.  The link-prediction problem for social networks , 2007, J. Assoc. Inf. Sci. Technol..