Measuring the effect of node aggregation on community detection

Many times the nodes of a complex network, whether deliberately or not, are aggregated for technical, ethical, legal limitations or privacy reasons. A common example is the geographic position: one may uncover communities in a network of places, or of individuals identified with their typical geographical position, and then aggregate these places into larger entities, such as municipalities, thus obtaining another network. The communities found in the networks obtained at various levels of aggregation may exhibit various degrees of similarity, from full alignment to perfect independence. This is akin to the problem of ecological and atomic fallacies in statistics, or to the Modified Areal Unit Problem in geography. We identify the class of community detection algorithms most suitable to cope with node aggregation, and develop an index for aggregability, capturing to which extent the aggregation preserves the community structure. We illustrate its relevance on real-world examples (mobile phone and Twitter reply-to networks). Our main message is that any node-partitioning analysis performed on aggregated networks should be interpreted with caution, as the outcome may be strongly influenced by the level of the aggregation.

[1]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Mark E. J. Newman,et al.  Generalized communities in networks , 2015, Physical review letters.

[3]  C. E. Gehlke,et al.  Certain Effects of Grouping upon the Size of the Correlation Coefficient in Census Tract Material , 1934 .

[4]  Cecilia Mascolo,et al.  Measuring Urban Social Diversity Using Interconnected Geo-Social Networks , 2016, WWW.

[5]  Jean-Charles Delvenne,et al.  The stability of a graph partition: A dynamics-based framework for community detection , 2013, ArXiv.

[6]  Etienne Huens,et al.  Geographical dispersal of mobile communication networks , 2008, 0802.2178.

[7]  Stan Openshaw,et al.  Modifiable Areal Unit Problem , 2008, Encyclopedia of GIS.

[8]  I. Thomas,et al.  Migration and commuting interactions fields: A new geography with community detection algorithm? , 2017 .

[9]  Ken Sexton,et al.  Modifiable Areal Unit Problem (MAUP) , 2008 .

[10]  W. S. Robinson,et al.  Ecological correlations and the behavior of individuals. , 1950, International journal of epidemiology.

[11]  K Sneppen,et al.  Networks and cities: an information perspective. , 2005, Physical review letters.

[12]  Zbigniew Smoreda,et al.  The anatomy of urban social networks and its implications in the searchability problem , 2015, Scientific Reports.

[13]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[14]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[15]  Mark E. J. Newman,et al.  Stochastic blockmodels and community structure in networks , 2010, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  Niloy Ganguly,et al.  Metrics for Community Analysis , 2016, ACM Comput. Surv..

[17]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  C. Blyth On Simpson's Paradox and the Sure-Thing Principle , 1972 .

[19]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[20]  Huan Liu,et al.  When is it biased?: assessing the representativeness of twitter's streaming API , 2014, WWW.

[21]  Ning Wang,et al.  Assessing the bias in samples of large online networks , 2014, Soc. Networks.

[22]  César A. Hidalgo,et al.  Unique in the Crowd: The privacy bounds of human mobility , 2013, Scientific Reports.

[23]  J. Reichardt,et al.  Partitioning and modularity of graphs with arbitrary degree distribution. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[24]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Jean-Charles Delvenne,et al.  The many facets of community detection in complex networks , 2016, Applied Network Science.

[27]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[28]  P. Rogerson,et al.  The Sage handbook of spatial analysis , 2009 .

[29]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[30]  Jean-Charles Delvenne,et al.  Detecting communities with the multi-scale Louvain method: robustness test on the metropolitan area of Brussels , 2018, Journal of Geographical Systems.

[31]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[32]  I. Thomas,et al.  Regions and borders of mobile telephony in Belgium and in the Brussels metropolitan zone , 2010 .

[33]  Jean-Charles Delvenne,et al.  Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks , 2014, IEEE Transactions on Network Science and Engineering.

[34]  S. Dongen A cluster algorithm for graphs , 2000 .

[35]  Matthieu Latapy,et al.  Computing Communities in Large Networks Using Random Walks , 2004, J. Graph Algorithms Appl..

[36]  Sang Hoon Lee,et al.  Detection of core–periphery structure in networks using spectral methods and geodesic paths , 2014, European Journal of Applied Mathematics.

[37]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[38]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[39]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  David W. S. Wong The Modifiable Areal Unit Problem (MAUP) , 2004 .