Subgraph frequencies: mapping the empirical and extremal geography of large graph collections

A growing set of on-line applications are generating data that can be viewed as very large collections of small, dense social graphs --- these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods in large social networks. A natural question is how to usefully define a domain-independent 'coordinate system' for such a collection of graphs, so that the set of possible structures can be compactly represented and understood within a common space. In this work, we draw on the theory of graph homomorphisms to formulate and analyze such a representation, based on computing the frequencies of small induced subgraphs within each graph. We find that the space of subgraph frequencies is governed both by its combinatorial properties --- based on extremal results that constrain all graphs --- as well as by its empirical properties --- manifested in the way that real social graphs appear to lie near a simple one-dimensional curve through this space. We develop flexible frameworks for studying each of these aspects. For capturing empirical properties, we characterize a simple stochastic generative model, a single-parameter extension of Erdos-Renyi random graphs, whose stationary distribution over subgraphs closely tracks the one-dimensional concentration of the real social graph families. For the extremal properties, we develop a tractable linear program for bounding the feasible space of subgraph frequencies by harnessing a toolkit of known extremal graph theory. Together, these two complementary frameworks shed light on a fundamental question pertaining to social graphs: what properties of social graphs are 'social' properties and what properties are 'graph' properties? We conclude with a brief demonstration of how the coordinate system we examine can also be used to perform classification tasks, distinguishing between structures arising from different types of social graphs.

[1]  Ki Hang Kim,et al.  On a problem of Turán , 1983 .

[2]  Katherine Faust,et al.  Very Local Structure in Social Networks , 2006 .

[3]  Katherine Faust,et al.  7. Very Local Structure in Social Networks , 2007 .

[4]  V. Sós,et al.  Counting Graph Homomorphisms , 2006 .

[5]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[6]  M Girvan,et al.  Structure of growing social networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[7]  A. Rapoport Spread of information through a population with socio-structural bias: I. Assumption of transitivity , 1953 .

[8]  Nan Li,et al.  Neighborhood based fast graph search in large networks , 2011, SIGMOD '11.

[9]  David Strauss On a general class of models for interaction , 1986 .

[10]  R. A. R. A Z B O R O V On the minimal density of triangles in graphs , 2008 .

[11]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[12]  P. Diaconis,et al.  Estimating and understanding exponential random graph models , 2011, 1102.2650.

[13]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[14]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[15]  Alexander A. Razborov,et al.  Flag algebras , 2007, Journal of Symbolic Logic.

[16]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[17]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[18]  Katherine Faust,et al.  A puzzle concerning triads in social networks: Graph constraints and the triad census , 2010, Soc. Networks.

[19]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[20]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[21]  George Karypis,et al.  An efficient algorithm for discovering frequent subgraphs , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  J. Voß Measuring Wikipedia , 2005 .

[23]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[24]  S. Leinhardt,et al.  The Structure of Positive Interpersonal Relations in Small Groups. , 1967 .

[25]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[26]  Jon M. Kleinberg,et al.  Center of Attention: How Facebook Users Allocate Attention across Friends , 2011, ICWSM.

[27]  Geng Li,et al.  Effective graph classification based on topological and label attributes , 2012, Stat. Anal. Data Min..

[28]  Danyel Fisher,et al.  You Are Who You Talk To: Detecting Roles in Usenet Newsgroups , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[29]  Lars Backstrom,et al.  Structural diversity in social contagion , 2012, Proceedings of the National Academy of Sciences.

[30]  References , 1971 .

[31]  Alexander Sidorenko,et al.  A correlation inequality for bipartite graphs , 1993, Graphs Comb..

[32]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[33]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[34]  László Lovász,et al.  Very large graphs , 2009, 0902.0132.

[35]  Juyong Park,et al.  Solution for the properties of a clustered network. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Béla Bollobás,et al.  Random Graphs: Notation , 2001 .