Estimating clique composition and size distributions from sampled network data

Cliques are defined as complete graphs or subgraphs; they are the strongest form of cohesive subgroup, and are of interest in both social science and engineering contexts. In this paper we show how to efficiently estimate the distribution of clique sizes from a probability sample of nodes obtained from a graph (e.g., by independence or link-trace sampling). We introduce two types of unbiased estimators, one of which exploits labeling of sampled nodes neighbors and one of which does not require this information. This is the first work to present statistically principled design-based estimators for clique distributions in arbitrary graphs using sampled network data. We generalize our estimators to cases in which cliques are distinguished not only by size but also by node attributes, allowing us to estimate clique composition by size. Last, we compare our estimators on a variety of real-world graphs and provide suggestions for their use.

[1]  Ronald S. Burt,et al.  Network items and the general social survey , 1984 .

[2]  Minas Gjoka,et al.  Walking in Facebook: A Case Study of Unbiased Sampling of OSNs , 2010, 2010 Proceedings IEEE INFOCOM.

[3]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[4]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[5]  Peter Richmond,et al.  Calculating statistics of complex networks through random walks with an application to the on-line social network Bebo , 2009 .

[6]  Jeffrey Xu Yu,et al.  Finding maximal cliques in massive networks by H*-graph , 2010, SIGMOD Conference.

[7]  P Willett,et al.  Identification of tertiary structure resemblance in proteins using a maximal common subgraph isomorphism algorithm. , 1993, Journal of molecular biology.

[8]  P. Killworth,et al.  Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data , 1979 .

[9]  Mark Weiser,et al.  Source Code , 1987, Computer.

[10]  L. Smith-Lovin,et al.  Sex and Race Homogeneity in Naturally Occurring Groups , 1995 .

[11]  S. Feld The Focused Organization of Social Ties , 1981, American Journal of Sociology.

[12]  P. V. Marsden,et al.  NETWORK DATA AND MEASUREMENT , 1990 .

[13]  Martin Kilduff,et al.  Structure, culture and Simmelian ties in entrepreneurial firms , 2002, Soc. Networks.

[14]  Martina Morris,et al.  statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data. , 2008, Journal of statistical software.

[15]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[16]  Krishna P. Gummadi,et al.  On the evolution of user interaction in Facebook , 2009, WOSN '09.

[17]  Edo Liberty,et al.  Estimating Sizes of Social Networks via Biased Sampling , 2014, Internet Math..

[18]  P. V. Marsden,et al.  Models and Methods in Social Network Analysis: Recent Developments in Network Measurement , 2005 .

[19]  Muhammad Hanif,et al.  Sampling With Unequal Probabilities , 1982 .

[20]  StrashDarren,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2013 .

[21]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[22]  M. H. Hansen,et al.  On the Theory of Sampling from Finite Populations , 1943 .

[23]  Luciano Rossoni,et al.  Models and methods in social network analysis , 2006 .

[24]  F. Schreiber,et al.  MODA: an efficient algorithm for network motif discovery in biological networks. , 2009, Genes & genetic systems.

[25]  Ingegerd Jansson,et al.  Clique structure in school class data , 1997 .

[26]  David Eppstein,et al.  Listing All Maximal Cliques in Large Sparse Real-World Graphs , 2011, JEAL.

[27]  Minas Gjoka,et al.  2.5K-graphs: From sampling to generation , 2012, 2013 Proceedings IEEE INFOCOM.

[28]  Minas Gjoka,et al.  Practical Recommendations on Crawling Online Social Networks , 2011, IEEE Journal on Selected Areas in Communications.

[29]  Fred Stentiford,et al.  Image recognition using maximal cliques of interest points , 2010, 2010 IEEE International Conference on Image Processing.

[30]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[31]  Joel Sokol,et al.  Optimal Protein Structure Alignment Using Maximum Cliques , 2005, Oper. Res..

[32]  Eric Harley,et al.  Estimation of the number of cliques in a random graph , 2010, C3S2E '10.

[33]  Radu Horaud,et al.  Stereo Correspondence Through Feature Grouping and Maximal Cliques , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[35]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..