A linear delay algorithm for enumerating all connected induced subgraphs

BackgroundReal biological and social data is increasingly being represented as graphs. Pattern-mining-based graph learning and analysis techniques report meaningful biological subnetworks that elucidate important interactions among entities. At the backbone of these algorithms is the enumeration of pattern space.ResultsWe propose an efficient algorithm for enumerating all connected induced subgraphs of an undirected graph. Building on this enumeration approach, we propose an algorithm for mining all maximal cohesive subgraphs that integrates vertices’ attributes with subgraph enumeration. To efficiently mine all maximal cohesive subgraphs, we propose two pruning techniques that remove futile search nodes in the enumeration tree.ConclusionsExperiments on synthetic and real graphs show the effectiveness of the proposed algorithm and the pruning techniques. On enumerating all connected induced subgraphs, our algorithm is several times faster than existing approaches. On dense graphs, the proposed approach is at least an order of magnitude faster than the best existing algorithm. Experiments on protein-protein interaction network with cancer gene dysregulation profile show that the reported cohesive subnetworks are biologically interesting.

[1]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[2]  Takeaki Uno,et al.  Enumeration of condition-dependent dense modules in protein interaction networks , 2009, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[3]  David Avis,et al.  Reverse Search for Enumeration , 1996, Discret. Appl. Math..

[4]  M. V. Wilkes,et al.  The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[5]  Martin Ester,et al.  Mining Cohesive Patterns from Graphs with Feature Vectors , 2009, SDM.

[6]  Salim A. Chowdhury,et al.  Subnetwork State Functions Define Dysregulated Subnetworks in Cancer , 2010, J. Comput. Biol..

[7]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[9]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[10]  K. Devlin Fundamentals of contemporary set theory , 1979 .

[11]  Hiroshi Nagamochi,et al.  Comparison and Enumeration of Chemical Graphs , 2013, Computational and structural biotechnology journal.

[12]  Christophe Rigotti,et al.  Finding maximal homogeneous clique sets , 2013, Knowledge and Information Systems.

[13]  Mehmet Koyutürk,et al.  Efficiently Enumerating All Connected Induced Subgraphs of a Large Molecular Network , 2014, AlCoB.

[14]  Takeaki Uno,et al.  An Efficient Algorithm for Solving Pseudo Clique Enumeration Problem , 2008, Algorithmica.

[15]  Takeaki Uno,et al.  Constant Time Enumeration by Amortization , 2015, WADS.

[16]  Mohammed J. Zaki,et al.  Mining Attribute-structure Correlated Patterns in Large Attributed Graphs , 2012, Proc. VLDB Endow..

[17]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[18]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[19]  Hideo Matsuda,et al.  Classifying Molecular Sequences Using a Linkage Graph With Their Pairwise Similarities , 1999, Theor. Comput. Sci..

[20]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[21]  Núria Queralt-Rosinach,et al.  DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants , 2016, Nucleic Acids Res..

[22]  Takeaki Uno,et al.  An Efficient Algorithm for Enumerating Pseudo Cliques , 2007, ISAAC.

[23]  Wei Jiang,et al.  Systematic dissection of dysregulated transcription factor–miRNA feed-forward loops across tumor types , 2015, Briefings Bioinform..

[24]  Ryan A. Rossi,et al.  The Network Data Repository with Interactive Graph Analytics and Visualization , 2015, AAAI.