Motif Mining in Weighted Networks

Unexpectedly frequent subgraphs, known as motifs, can help in characterizing the structure of complex networks. Most of the existing methods for finding motifs are designed for unweighted networks, where only the existence of connection between nodes is considered, and not their strength or capacity. However, in many real world networks, edges contain more information than just simple node connectivity. In this paper, we propose a new method to incorporate edge weight information in motif mining. We think of a motif as a subgraph that contains unexpected information, and we define a new significance measurement to assess this subgraph exceptionality. The proposed metric embeds the weight distribution in subgraphs and it is based on weight entropy. We use the g-trie data structure to find instances of k-sized subgraphs and to calculate its significance score. Following a statistical approach, the random entropy of subgraphs is then calculated, avoiding the time consuming step of random network generation. The discrimination power of the derived motif profile by the proposed method is assessed against the results of the traditional unweighted motifs through a graph classification problem. We use a set of labeled ego networks of co-authorship in the biology and mathematics fields. The new proposed method is shown to be feasible, achieving even slightly better accuracy. Since it does not require the generation of random networks, it is also computationally faster, and because we are able to use the weight information in computing the motif importance, we can avoid converting weighted networks into unweighted ones.

[1]  Wei Wang,et al.  Efficient mining of frequent subgraphs in the presence of isomorphism , 2003, Third IEEE International Conference on Data Mining.

[2]  Luc De Raedt,et al.  The molecular feature miner MolFea , 2003 .

[3]  Chris Clifton,et al.  Knowledge discovery from transportation network data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Marcus Kaiser,et al.  Strategies for Network Motifs Discovery , 2009, 2009 Fifth IEEE International Conference on e-Science.

[6]  Fernando M. A. Silva,et al.  g-tries: an efficient data structure for discovering network motifs , 2010, SAC '10.

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[10]  Jari Saramäki,et al.  Characterizing Motifs in Weighted Complex Networks , 2005 .

[11]  Klemens Böhm,et al.  Mining Edge-Weighted Call Graphs to Localise Software Bugs , 2008, ECML/PKDD.

[12]  Peter Donnelly,et al.  Superfamilies of Evolved and Designed Networks , 2004 .

[13]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[14]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[15]  Wojciech Szpankowski,et al.  An efficient algorithm for detecting frequent subgraphs in biological networks , 2004, ISMB/ECCB.

[16]  Sahar Asadi,et al.  Kavosh: a new algorithm for finding network motifs , 2009, BMC Bioinformatics.

[17]  Fernando M. A. Silva,et al.  Comparison of Co-authorship Networks across Scientific Fields Using Motifs , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[18]  Franck Picard,et al.  Assessing the Exceptionality of Network Motifs , 2007, J. Comput. Biol..

[19]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Arend Hintze,et al.  Information Content of Colored Motifs in Complex Networks , 2011, Artificial Life.

[21]  Sebastian Wernicke,et al.  Efficient Detection of Network Motifs , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[23]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[24]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[25]  Réka Albert,et al.  Conserved network motifs allow protein-protein interaction prediction , 2004, Bioinform..

[26]  Fernando M. A. Silva,et al.  Querying subgraph sets with g-tries , 2012, DBSocial '12.

[27]  George Karypis,et al.  Finding Frequent Patterns in a Large Sparse Graph* , 2005, Data Mining and Knowledge Discovery.

[28]  Akihiro Inokuchi Mining generalized substructures from a set of labeled graphs , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[29]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[30]  Frans Coenen,et al.  Frequent Sub-graph Mining on Edge Weighted Graphs , 2010, DaWak.