Collaboration over time: characterizing and modeling network evolution

A formal type of scientific and academic collaboration is coauthorship which can be represented by a coauthorship network. Coauthorship networks are among some of the largest social networks and offer us the opportunity to study the mechanisms underlying large-scale real world networks. We construct such a network for the Computer Science field covering research collaborations from 1980 to 2005, based on a large dataset of 451,305 papers authored by 283,174 distinct researchers. By mining this network, we first present a comprehensive study of the network statistical properties for a longitudinal network at the overall network level as well as for the intermediate community level. Major observations are that the database community is the best connected while the AI community is the most assortative, and that the Computer Science field as a whole shows a collaboration pattern more similar to Mathematics than to Biology. Moreover, the small world phenomenon and the scale-free degree distribution accompany the growth of the network. To study the individual collaborations, we propose a novel stochastic model, Stochastic Poisson model with Optimization Tree (Spot)to efficiently predict any increment of collaboration based on the local neighborhood structure. Spot models the non-stationary Poisson process by maximizing the log-likelihood with a tree structure. Empirical results show that Spot outperforms Support Vector Regression by better fitting collaboration records and predicting the rate of collaboration

[1]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[2]  Amin Saberi,et al.  Exploring the community structure of newsgroups , 2004, KDD.

[3]  M. Newman Clustering and preferential attachment in growing networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[5]  M. Newman,et al.  Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[7]  Dongwon Lee,et al.  On six degrees of separation in DBLP-DB and more , 2005, SGMD.

[8]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[10]  C. Lee Giles,et al.  Efficient Name Disambiguation for Large-Scale Databases , 2006, PKDD.

[11]  Weixiong Zhang,et al.  Identification and Evaluation of Weak Community Structures in Networks , 2006, AAAI.

[12]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[13]  Alfred I. Maleson,et al.  Questions and Comments , 1954 .

[14]  S. Redner Citation Statistics From More Than a Century of Physical Review , 2004, physics/0407137.

[15]  F. Stokman Evolution of social networks , 1997 .

[16]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[17]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Luís A. Nunes Amaral,et al.  Sexual networks: implications for the transmission of sexually transmitted infections. , 2003, Microbes and infection.

[19]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[20]  Edward A. Fox,et al.  Automatic document metadata extraction using support vector machines , 2003, 2003 Joint Conference on Digital Libraries, 2003. Proceedings..

[21]  Sheldon M. Ross,et al.  Introduction to Probability Models, Eighth Edition , 1972 .

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[24]  Jon Kleinberg,et al.  The link prediction problem for social networks , 2003, CIKM '03.

[25]  M. Newman,et al.  Why social networks are different from other types of networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[27]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Lise Getoor,et al.  Social Capital in Friendship-Event Networks , 2006, Sixth International Conference on Data Mining (ICDM'06).

[29]  M. Newman 1 Who is the best connected scientist ? A study of scientific coauthorship networks , 2004 .

[30]  M. Newman,et al.  The structure of scientific collaboration networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[31]  H E Stanley,et al.  Classes of small-world networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[33]  Ravi Kumar,et al.  Structure and evolution of online social networks , 2006, KDD '06.