ClueNet: Clustering a temporal network based on topological similarity rather than denseness

Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of “topologically related” nodes, where the resulting topology-based clusters are expected to “correlate” well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, the problem is even more complex, due to an additional dimension of the data—their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. Second, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in over 83% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.

[1]  Yuval Shavitt,et al.  RAGE - A rapid graphlet enumerator for large networks , 2012, Comput. Networks.

[2]  Arie Budovsky,et al.  The Human Ageing Genomic Resources: online databases and tools for biogerontologists , 2009, Aging cell.

[3]  Tijana Milenkovic,et al.  Graphlet-based edge clustering reveals pathogen-interacting proteins , 2012, Bioinform..

[4]  Jure Leskovec,et al.  Community Detection in Networks with Node Attributes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[5]  David J. Marchette,et al.  Scan Statistics on Enron Graphs , 2005, Comput. Math. Organ. Theory.

[6]  A. Barrat,et al.  Estimating Potential Infection Transmission Routes in Hospital Wards Using Wearable Proximity Sensors , 2013, PloS one.

[7]  Jean-Loup Guillaume,et al.  Multi-Step Community Detection and Hierarchical Time Segmentation in Evolving Networks , 2011, KDD 2011.

[8]  Elaine Shi,et al.  Link prediction by de-anonymization: How We Won the Kaggle Social Network Challenge , 2011, The 2011 International Joint Conference on Neural Networks.

[9]  Lei Meng,et al.  The post-genomic era of biological network alignment , 2015, EURASIP J. Bioinform. Syst. Biol..

[10]  Albert-Lszl Barabsi,et al.  Network Science , 2016, Encyclopedia of Big Data.

[11]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[12]  Derek Greene,et al.  Tracking the Evolution of Communities in Dynamic Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[13]  Martin Rosvall,et al.  Maps of random walks on complex networks reveal community structure , 2007, Proceedings of the National Academy of Sciences.

[14]  Tijana Milenkovic,et al.  Dynamic networks reveal key players in aging , 2014, Bioinform..

[15]  Ying Wang,et al.  Algorithms for Large, Sparse Network Alignment Problems , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[17]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[18]  C. Brayne,et al.  Microarray analysis of the astrocyte transcriptome in the aging brain: relationship to Alzheimer's pathology and APOE genotype , 2011, Neurobiology of Aging.

[19]  Alain Barrat,et al.  Contact Patterns in a High School: A Comparison between Data Collected Using Wearable Sensors, Contact Diaries and Friendship Surveys , 2015, PloS one.

[20]  Mark E. J. Newman,et al.  Structure and inference in annotated networks , 2015, Nature Communications.

[21]  M. Zaslavskiy,et al.  A Path Following Algorithm for the Graph Matching Problem , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Santo Fortunato,et al.  Consensus clustering in complex networks , 2012, Scientific Reports.

[23]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[24]  Janez Demsar,et al.  A combinatorial approach to graphlet counting , 2014, Bioinform..

[25]  A. Barabasi,et al.  Quantifying social group evolution , 2007, Nature.

[26]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[28]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[29]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.

[30]  Leto Peel,et al.  The ground truth about metadata and community detection in networks , 2016, Science Advances.

[31]  Roger Guimerà,et al.  Extracting the hierarchical organization of complex systems , 2007, Proceedings of the National Academy of Sciences.

[32]  Carl W. Cotman,et al.  Gene expression changes in the course of normal brain aging are sexually dimorphic , 2008, Proceedings of the National Academy of Sciences.

[33]  Ryan A. Rossi,et al.  Efficient Graphlet Counting for Large Networks , 2015, 2015 IEEE International Conference on Data Mining.

[34]  Santo Fortunato,et al.  Detection of gene communities in multi-networks reveals cancer drivers , 2015, Scientific Reports.

[35]  Carl T. Bergstrom,et al.  Mapping Change in Large Networks , 2008, PloS one.

[36]  Tijana Milenkovic,et al.  MAGNA++: Maximizing Accuracy in Global Network Alignment via both node and edge conservation , 2015, Bioinform..

[37]  Jan Baumbach,et al.  Comparing the performance of biomedical clustering methods , 2015, Nature Methods.

[38]  Tijana Milenkoviæ,et al.  Uncovering Biological Network Function via Graphlet Degree Signatures , 2008, Cancer informatics.

[39]  Tanya Y. Berger-Wolf,et al.  A framework for analysis of dynamic social networks , 2006, KDD '06.

[40]  Tijana Milenkovic,et al.  Exploring the structure and function of temporal networks with dynamic graphlets , 2015, Bioinform..

[41]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.