One in Four Is Enough – Strategies for Selecting Ego Mailboxes for a Group Network View

Recently, researchers have started analyzing e-mail archives of individuals and groups as an approximation of social ties. It can be hard to obtain complete e-mail archives covering all exchanges between a group of individuals. Frequently, only e-mailboxes of a subset of the analyzed actors are available for analysis. In this project we report on some experiments to find the best ego networks (i.e. mailboxes) to give a “reasonably” complete picture of the full social group network. We also report on the stability of social network metrics with respect to incomplete networks. We have collected the complete individual mailboxes over a period of 20 weeks of 53 researchers working in the same lab, collaborating on different (research and educational) projects. We have done a series of simulations to identify the best strategies and metrics for analysis of incomplete e-mail networks. Applying snowball sampling and subsequently adding more members of the group, we have compared a globally optimal selection strategy, adding the next-best member with respect to the chosen metric, a locally best strategy, adding the next best member within the already known network, and a random selection strategy. As sampling metrics, we used individual and group betweenness centrality, group density, number of nodes and edges, and others. We have categorized ego networks by roles of individual actors as lab manager, project and subproject managers and project contributors. Lab managers and project managers are in the core, individual contributors are in the periphery of the group network. Results show that good approximations of group network structures are already obtained with 25% to 30% of the mailboxes of the community.

[1]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[2]  Steven B. Andrews,et al.  Structural Holes: The Social Structure of Competition , 1995, The SAGE Encyclopedia of Research Design.

[3]  Alessandro Vespignani,et al.  Epidemic dynamics and endemic states in complex networks. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4]  S. Bornholdt,et al.  Scale-free topology of e-mail networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[5]  Thomas W. Valente,et al.  The stability of centrality measures when networks are sampled , 2003, Soc. Networks.

[6]  Bernardo A. Huberman,et al.  Email as spectroscopy: automated discovery of community structure within organizations , 2003 .

[7]  Yan Zhao,et al.  Visualization of Communication Patterns in Collaborative Innovation Networks - Analysis of Some W3C Working Groups , 2003, CIKM '03.

[8]  Alexandra Marin,et al.  Are respondents more likely to list alters with certain characteristics?: Implications for name generator data , 2004, Soc. Networks.

[9]  Peter A. Gloor,et al.  TeCFlow – A Temporal Communication Flow Visualizer for Social Network Analysis , 2004 .

[10]  Anuska Ferligoj,et al.  Effects on reliability and validity of egocentered network measurements , 2005, Soc. Networks.

[11]  Kathleen M. Carley,et al.  On the robustness of centrality measures under conditions of imperfect data , 2006, Soc. Networks.

[12]  P. Gloor,et al.  E-mail May Not Reflect The Social Network , 2006 .

[13]  Peter Cebon,et al.  Swarm Creativity: Competitive Advantage Through Collaborative Innovation Networks , 2006 .

[14]  Peter A. Gloor,et al.  Correlating temporal communication patterns of the Eclipse open source community with performance and creativity , 2007, Comput. Math. Organ. Theory.

[15]  Albert-László Barabási,et al.  Scale-free networks , 2008, Scholarpedia.