Network Motif Discovery: A GPU Approach

The identification of network motifs has important applications in numerous domains, such as pattern detection in biological networks and graph analysis in digital circuits. However, mining network motifs is computationally challenging, as it requires enumerating subgraphs from a real-life graph, and computing the frequency of each subgraph in a large number of random graphs. In particular, existing solutions often require days to derive network motifs from biological networks with only a few thousand vertices. To address this problem, this paper presents a novel study on network motif discovery using Graphical Processing Units (GPUs). The basic idea is to employ GPUs to parallelize a large number of subgraph matching tasks in computing subgraph frequencies from random graphs, so as to reduce the overall computation time of network motif discovery. We explore the design space of GPU-based subgraph matching algorithms, with careful analysis of several crucial factors (such as branch divergences and memory coalescing) that affect the performance of GPU programs. Based on our analysis, we develop a GPU-based solution that (i) considerably differs from existing CPU-based methods in how it enumerates subgraphs, and (ii) exploits the strengths of GPUs in terms of parallelism while mitigating their limitations in terms of the computation power per GPU core. With extensive experiments on a variety of biological networks, we show that our solution is up to two orders of magnitude faster than the best CPU-based approach, and is around $20$ times more cost-effective than the latter, when taking into account the monetary costs of the CPU and GPUs used.

[1]  Tianyi David Han,et al.  Reducing branch divergence in GPU programs , 2011, GPGPU-4.

[2]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[3]  O. Sporns,et al.  Motifs in Brain Networks , 2004, PLoS biology.

[4]  Sourav S. Bhowmick,et al.  Efficient algorithms for generalized subgraph query processing , 2012, CIKM '12.

[5]  Jeffrey Xu Yu,et al.  Taming verification hardness: an efficient algorithm for testing subgraph isomorphism , 2008, Proc. VLDB Endow..

[6]  Ambuj K. Singh,et al.  GraphRank: Statistical Modeling and Mining of Significant Subgraphs in the Feature Space , 2006, Sixth International Conference on Data Mining (ICDM'06).

[7]  Ulf Assarsson,et al.  Fast parallel GPU-sorting using a hybrid algorithm , 2008, J. Parallel Distributed Comput..

[8]  Fernando M. A. Silva,et al.  Parallel discovery of network motifs , 2012, J. Parallel Distributed Comput..

[9]  Xing Xie,et al.  Network Motif Discovery: A GPU Approach , 2017, IEEE Trans. Knowl. Data Eng..

[10]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[11]  Ambuj K. Singh,et al.  GraphSig: A Scalable Approach to Mining Significant Subgraphs in Large Graph Databases , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[12]  Gang Wang,et al.  NetMODE: Network Motif Detection without Nauty , 2012, PloS one.

[13]  Aristides Gionis,et al.  Mining Large Networks with Subgraph Counting , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Jeffrey D. Ullman,et al.  Enumerating subgraph instances using map-reduce , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[15]  R. Solé,et al.  Are network motifs the spandrels of cellular complexity? , 2006, Trends in ecology & evolution.

[16]  Noga Alon,et al.  Biomolecular network motif counting and discovery by color coding , 2008, ISMB.

[17]  Joshua A. Grochow,et al.  Network Motif Discovery Using Subgraph Enumeration and Symmetry-Breaking , 2007, RECOMB.

[18]  F. Schreiber,et al.  MODA: an efficient algorithm for network motif discovery in biological networks. , 2009, Genes & genetic systems.

[19]  Uri Alon,et al.  Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs , 2004, Bioinform..

[20]  Uri Alon,et al.  Coarse-graining and self-dissimilarity of complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[21]  Lin Ma,et al.  Parallel subgraph listing in a large-scale graph , 2014, SIGMOD Conference.

[22]  Edward B. Suh,et al.  A parallel algorithm for extracting transcriptional regulatory network motifs , 2005, Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05).

[23]  Philip S. Yu,et al.  Mining significant graph patterns by leap search , 2008, SIGMOD Conference.

[24]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[25]  Brendan D. McKay,et al.  Practical graph isomorphism, II , 2013, J. Symb. Comput..

[26]  Fernando M. A. Silva,et al.  g-tries: an efficient data structure for discovering network motifs , 2010, SAC '10.

[27]  Xiaokui Xiao,et al.  Large-scale frequent subgraph mining in MapReduce , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[28]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[29]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[30]  Madhav V. Marathe,et al.  Subgraph Enumeration in Large Social Contact Networks Using Parallel Color Coding and Streaming , 2010, 2010 39th International Conference on Parallel Processing.

[31]  Ina Koch,et al.  QuateXelero: An Accelerated Exact Network Motif Detection Algorithm , 2013, PloS one.

[32]  Bingsheng He,et al.  Frequent itemset mining on graphics processors , 2009, DaMoN '09.

[33]  Jeong-Hoon Lee,et al.  Turboiso: towards ultrafast and robust subgraph isomorphism search in large graph databases , 2013, SIGMOD '13.

[34]  Sahar Asadi,et al.  Kavosh: a new algorithm for finding network motifs , 2009, BMC Bioinformatics.

[35]  Mong-Li Lee,et al.  NeMoFinder: dissecting genome-wide protein-protein interactions with meso-scale network motifs , 2006, KDD '06.