Model Selection for Social Networks Using Graphlets

Several network models have been proposed to explain the link structure observed in online social networks. This paper addresses the problem of choosing the model that best fits a given real-world network. We implement a model-selection method based on unsupervised learning. An alternating decision tree is trained using synthetic graphs generated according to each of the models under consideration. We use a broad array of features, with the aim of representing different structural aspects of the network. Features include the frequency counts of small subgraphs (graphlets) as well as features capturing the degree distribution and small-world property. Our method correctly classifies synthetic graphs, and is robust under perturbations of the graphs. We show that the graphlet counts alone are sufficient in separating the training data, indicating that graphlet counts are a good way of capturing network structure. We tested our approach on four Facebook graphs from various American universities. The models that best fit these data are those that are based on the principle of preferential attachment.

[1]  Christos Faloutsos,et al.  Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication , 2005, PKDD.

[2]  Ben Y. Zhao,et al.  Measurement-calibrated graph models for social network experiments , 2010, WWW '10.

[3]  F. Chung,et al.  Complex Graphs and Networks , 2006 .

[4]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[5]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[6]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[7]  Silvio Lattanzi,et al.  Affiliation networks , 2009, STOC '09.

[8]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[9]  Anthony Bonato,et al.  The Geometric Protean Model for On-Line Social Networks , 2010, WAW.

[10]  Anthony Bonato,et al.  A Spatial Web Graph Model with Local Influence Regions , 2007, Internet Math..

[11]  Anthony Bonato,et al.  Infinite Limits and Adjacency Properties of a Generalized Copying Model , 2007, Internet Math..

[12]  Balachander Krishnamurthy,et al.  A few chirps about twitter , 2008, WOSN '08.

[13]  Rory Wilson,et al.  Geometric Graph Properties of the Spatial Preferred Attachment model , 2011, ArXiv.

[14]  Jiangchuan Liu,et al.  Statistics and Social Network of YouTube Videos , 2008, 2008 16th Interntional Workshop on Quality of Service.

[15]  Jeannette C. M. Janssen,et al.  Spatial Models for Virtual Networks , 2010, CiE.

[16]  Jure Leskovec,et al.  Multiplicative Attribute Graph Model of Real-World Networks , 2010, Internet Math..

[17]  Anthony Bonato,et al.  Models for On-line Social Networks , 2009 .

[18]  Aristides Gionis,et al.  Mining Large Networks with Subgraph Counting , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  S. Redner,et al.  Network growth by copying. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  Mathew D. Penrose,et al.  Random Geometric Graphs , 2003 .

[21]  Mason A. Porter,et al.  Social Structure of Facebook Networks , 2011, ArXiv.

[22]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[23]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[24]  E. Ziv,et al.  Inferring network mechanisms: the Drosophila melanogaster protein interaction network. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[26]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[27]  Geoff Holmes,et al.  Multiclass Alternating Decision Trees , 2002, ECML.

[28]  Chen-Nee Chuah,et al.  Unveiling facebook: a measurement study of social network based applications , 2008, IMC '08.

[29]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[30]  Karsten M. Borgwardt,et al.  The graphlet spectrum , 2009, ICML '09.

[31]  Anthony Bonato,et al.  Models of Online Social Networks , 2009, Internet Math..

[32]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[33]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[34]  Colin Cooper,et al.  The degree distribution of the generalized duplication model , 2006, Theor. Comput. Sci..

[35]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[36]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[37]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[39]  Fan Chung Graham,et al.  Duplication Models for Biological Networks , 2002, J. Comput. Biol..

[40]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[41]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[42]  Christos Faloutsos,et al.  Statistical Properties of Social Networks , 2011, Social Network Data Analytics.

[43]  Desmond J. Higham,et al.  Fitting a geometric graph to a protein-protein interaction network , 2008, Bioinform..

[44]  Linyuan Lu,et al.  Random evolution in massive graphs , 2001 .

[45]  W. Marsden I and J , 2012 .

[46]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[47]  Anthony Bonato,et al.  A course on the Web graph , 2008 .

[48]  Fan Chung Graham,et al.  Random evolution in massive graphs , 2001, Proceedings 2001 IEEE International Conference on Cluster Computing.

[50]  Seungyeop Han,et al.  Analysis of topological characteristics of huge online social networking services , 2007, WWW '07.

[51]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[52]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..