Differential Community Detection in Paired Biological Networks

Motivation Biological networks unravel the inherent structure of molecular interactions which can lead to discovery of driver genes and meaningful pathways especially in cancer context. Often due to gene mutations, the gene expression undergoes changes and the corresponding gene regulatory network sustains some amount of localized re-wiring. The ability to identify significant changes in the interaction patterns caused by the progression of the disease can lead to the revelation of novel relevant signatures. Methods The task of identifying differential sub-networks in paired biological networks (A:control,B:case) can be re-phrased as one of finding dense communities in a single noisy differential topological (DT) graph constructed by taking absolute difference between the topological graphs of A and B. In this paper, we propose a fast two-stage approach, namely Differential Community Detection (DCD), to identify differential sub-networks as differential communities in a de-noised version of the DT graph. In the first stage, we iteratively re-order the nodes of the DT graph to determine approximate block diagonals present in the DT adjacency matrix using neighbourhood information of the nodes and Jaccard similarity. In the second stage, the ordered DT adjacency matrix is traversed along the diagonal to remove all the edges associated with a node, if that node has no immediate edges within a window. We then apply community detection methods on this de-noised DT graph to discover differential sub-networks as communities. Results Our proposed DCD approach can effectively locate differential sub-networks in several simulated paired random-geometric networks and various paired scale-free graphs with different power-law exponents. The DCD approach easily outperforms community detection methods applied on the original noisy DT graph and recent statistical techniques in simulation studies. We applied DCD method on two real datasets: a) Ovarian cancer dataset to discover differential DNA co-methylation sub-networks in patients and controls; b) Glioma cancer dataset to discover the difference between the regulatory networks of IDH-mutant and IDH-wild-type. We demonstrate the potential benefits of DCD for finding network-inferred bio-markers/pathways associated with a trait of interest. Conclusion The proposed DCD approach overcomes the limitations of previous statistical techniques and the issues associated with identifying differential sub-networks by use of community detection methods on the noisy DT graph. This is reflected in the superior performance of the DCD method with respect to various metrics like Precision, Accuracy, Kappa and Specificity. The code implementing proposed DCD method is available at https://sites.google.com/site/ raghvendramallmlresearcher/codes.

[1]  T. Ahern,et al.  Colorectal cancer, comorbidity, and risk of venous thromboembolism: assessment of biological interactions in a Danish nationwide cohort , 2015, British Journal of Cancer.

[2]  Martin Rosvall,et al.  Multilevel Compression of Random Walks on Networks Reveals Hierarchical Organization in Large Integrated Systems , 2010, PloS one.

[3]  Da Ruan,et al.  Statistical methods for comparing labelled graphs , 2014 .

[4]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[5]  David J. Arenillas,et al.  JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles , 2015, Nucleic Acids Res..

[6]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[7]  Kim-Anh Do,et al.  DINGO: differential network analysis in genomics , 2015, Bioinform..

[8]  Zhisong Pan,et al.  MODA: MOdule Differential Analysis for weighted gene co-expression network , 2016 .

[9]  Vladimir B. Bajic,et al.  HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models , 2015, Nucleic Acids Res..

[10]  Panos M. Pardalos,et al.  Statistical analysis of financial networks , 2005, Comput. Stat. Data Anal..

[11]  Antonella Santone,et al.  De novo reconstruction of gene regulatory networks from time series data, an approach based on formal methods. , 2014, Methods.

[12]  Johan A. K. Suykens,et al.  Multilevel Hierarchical Kernel Spectral Clustering for Real-Life Large Scale Complex Networks , 2014, PloS one.

[13]  Johan A. K. Suykens,et al.  Self-tuned kernel spectral clustering for large scale networks , 2013, 2013 IEEE International Conference on Big Data.

[14]  S. Horvath,et al.  Weighted gene coexpression network analysis strategies applied to mouse weight , 2007, Mammalian Genome.

[15]  Johan A. K. Suykens,et al.  Kernel Spectral Clustering for Big Data Networks , 2013, Entropy.

[16]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.

[17]  Steven J. M. Jones,et al.  Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma , 2016, Cell.

[18]  Edward R. Scheinerman,et al.  Fractional isomorphism of graphs , 1994, Discret. Math..

[19]  S. Ambs,et al.  Interactions among genes, tumor biology and the environment in cancer health disparities: examining the evidence on a national and global scale. , 2011, Carcinogenesis.

[20]  Wolfgang Wagner,et al.  Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer. , 2010, Genome research.

[21]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[22]  Athanasios V. Vasilakos,et al.  Understanding user behavior in online social networks: a survey , 2013, IEEE Communications Magazine.

[23]  Gang Wu,et al.  MIMO: an efficient tool for molecular interaction maps overlap , 2013, BMC Bioinformatics.

[24]  Kay W. Axhausen,et al.  Graph-Theoretical Analysis of the Swiss Road and Railway Networks Over Time , 2009 .

[25]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[26]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[27]  Susmita Datta,et al.  A statistical framework for differential network analysis from microarray data , 2010, BMC Bioinformatics.

[28]  Chris Arney Network Analysis: Methodological Foundations , 2012 .

[29]  Günce Keziban Orman,et al.  A Comparison of Community Detection Algorithms on Artificial Networks , 2009, Discovery Science.

[30]  Juan M. Vaquerizas,et al.  DNA-Binding Specificities of Human Transcription Factors , 2013, Cell.

[31]  Jean-Charles Lamirel,et al.  A New Efficient and Unbiased Approach for Clustering Quality Evaluation , 2011, PAKDD Workshops.

[32]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[33]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[34]  Shihua Zhang,et al.  Systematic DNA methylation analysis of multiple cell lines reveals common and specific patterns within and across tissues of origin. , 2015, Human molecular genetics.

[35]  Raghvendra Mall,et al.  Detection of statistically significant network changes in complex biological networks , 2016, bioRxiv.

[36]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[37]  Christina Backes,et al.  A novel algorithm for detecting differentially regulated paths based on gene set enrichment analysis , 2009, Bioinform..

[38]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[39]  Daniel Marbach,et al.  Tissue-specific regulatory circuits reveal variable modular perturbations across complex diseases , 2016, Nature Methods.

[40]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[42]  S. Horvath,et al.  Aging effects on DNA methylation modules in human brain and blood tissue , 2012, Genome Biology.

[43]  R. Hamming The Unreasonable Effectiveness of Mathematics. , 1980 .

[44]  Giovanni Montana,et al.  Differential analysis of biological networks , 2015, BMC Bioinformatics.

[45]  L. Hubert Assignment methods in combinatorial data analysis , 1986 .

[46]  Andrew E. Teschendorff,et al.  An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathways , 2013, Scientific Reports.

[47]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[48]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[49]  Manolis Kellis,et al.  ChromHMM: automating chromatin-state discovery and characterization , 2012, Nature Methods.

[50]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[52]  Tobias Müller,et al.  Identifying functional modules in protein–protein interaction networks: an integrated exact approach , 2008, ISMB.

[53]  Sing-Hoi Sze,et al.  Path Matching and Graph Matching in Biological Networks , 2007, J. Comput. Biol..

[54]  Serban Nacu,et al.  Gene expression network analysis and applications to immunology , 2007, Bioinform..

[55]  Andrew E. Teschendorff,et al.  A systems-level integrative framework for genome-wide DNA methylation and gene expression data identifies differential gene expression modules under epigenetic control , 2014, Bioinform..

[56]  Gary D Bader,et al.  Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation , 2010, PloS one.