Reverse enGENEering of Regulatory Networks from Big Data: A Roadmap for Biologists

Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform these data into biological knowledge, for example, how to use these data to answer questions such as: Which functional pathways are involved in cell differentiation? Which genes should we target to stop cancer? Network analysis is a powerful and general approach to solve this problem consisting of two fundamental stages, network reconstruction, and network interrogation. Here we provide an overview of network analysis including a step-by-step guide on how to perform and use this approach to investigate a biological question. In this guide, we also include the software packages that we and others employ for each of the steps of a network analysis workflow.

[1]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[2]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[3]  May D. Wang,et al.  GoMiner: a resource for biological interpretation of genomic and proteomic data , 2003, Genome Biology.

[4]  Bonnie Berger,et al.  IsoRankN: spectral methods for global alignment of multiple protein networks , 2009, Bioinform..

[5]  Jesse M. Engreitz,et al.  ProfileChaser: searching microarray repositories based on genome-wide patterns of differential expression , 2011, Bioinform..

[6]  J. Hopfield,et al.  From molecular to modular cell biology , 1999, Nature.

[7]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[8]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Ali Shilatifard,et al.  Chromatin modifications by methylation and ubiquitination: implications in the regulation of gene expression. , 2006, Annual review of biochemistry.

[10]  Albert-László Barabási,et al.  Controllability of complex networks , 2011, Nature.

[11]  Rachel B. Brem,et al.  Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks , 2008, Nature Genetics.

[12]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[13]  Davis J. McCarthy,et al.  Count-based differential expression analysis of RNA sequencing data using R and Bioconductor , 2013, Nature Protocols.

[14]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[15]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[16]  Hyunjin Yoon,et al.  Bottlenecks and Hubs in Inferred Networks Are Important for Virulence in Salmonella typhimurium , 2009, J. Comput. Biol..

[17]  N. Shulzhenko,et al.  Selection of control genes for quantitative RT-PCR based on microarray data. , 2005, Biochemical and biophysical research communications.

[18]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[19]  Olga G. Troyanskaya,et al.  A scalable method for integration and functional analysis of multiple microarray datasets , 2006, Bioinform..

[20]  Peter Langfelder,et al.  Eigengene networks for studying the relationships between co-expression modules , 2007, BMC Systems Biology.

[21]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[22]  Xuerui Yang,et al.  An Extensive MicroRNA-Mediated Network of RNA-RNA Interactions Regulates Established Oncogenic Pathways in Glioblastoma , 2011, Cell.

[23]  Chiara Romualdi,et al.  IDEG6: a web tool for detection of differentially expressed genes in multiple tag sampling experiments. , 2003, Physiological genomics.

[24]  S. Horvath,et al.  Functional organization of the transcriptome in human brain , 2008, Nature Neuroscience.

[25]  Marcel Geertz,et al.  Experimental strategies for studying transcription factor-DNA binding specificities. , 2010, Briefings in functional genomics.

[26]  Limsoon Wong,et al.  Exploiting Indirect Neighbours and Topological Weight to Predict Protein Function from Protein-Protein Interactions , 2006, BioDM.

[27]  Hamid Bolouri,et al.  A data integration methodology for systems biology. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[29]  S. Dudoit,et al.  Multiple Hypothesis Testing in Microarray Experiments , 2003 .

[30]  Anton J. Enright,et al.  Human MicroRNA Targets , 2004, PLoS biology.

[31]  E. Sonnhammer,et al.  Statistical Assessment of Crosstalk Enrichment between Gene Groups in Biological Networks , 2013, PloS one.

[32]  J. Uhm,et al.  The transcriptional network for mesenchymal transformation of brain tumours , 2010 .

[33]  Daniel L. Hartl,et al.  GeneMerge - Post-genomic Analysis, Data Mining, and Hypothesis Testing , 2003, Bioinform..

[34]  Mark Gerstein,et al.  The Importance of Bottlenecks in Protein Networks: Correlation with Gene Essentiality and Expression Dynamics , 2007, PLoS Comput. Biol..

[35]  N. Olson,et al.  The microarray data analysis process: From raw data to biological significance , 2006, NeuroRX.

[36]  Pankaj Agarwal,et al.  A global pathway crosstalk network , 2008, Bioinform..

[37]  Martin Kuiper,et al.  BiNGO: a Cytoscape plugin to assess overrepresentation of Gene Ontology categories in Biological Networks , 2005, Bioinform..

[38]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[39]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[40]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[41]  Tao Xie,et al.  Inferring causal genomic alterations in breast cancer using gene expression data , 2011, BMC Systems Biology.

[42]  T. Ideker,et al.  A gene ontology inferred from molecular networks , 2012, Nature Biotechnology.

[43]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[44]  Jean-Philippe Vert,et al.  SIRENE: supervised inference of regulatory networks , 2008, ECCB.

[45]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[46]  Michael Watson,et al.  CoXpress: differential co-expression in gene expression data , 2006, BMC Bioinformatics.

[47]  Ignacio González,et al.  integrOmics: an R package to unravel relationships between two omics datasets , 2009, Bioinform..

[48]  A. Morgun,et al.  Crosstalk between B lymphocytes, microbiota and the intestinal epithelium governs immunity versus metabolism in the gut , 2011, Nature Medicine.

[49]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[50]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[51]  P. Brown,et al.  Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Judea Pearl,et al.  Direct and Indirect Effects , 2001, UAI.

[53]  Rainer Spang,et al.  Finding disease specific alterations in the co-expression of genes , 2004, ISMB/ECCB.

[54]  Naryttza N. Diaz,et al.  The Subsystems Approach to Genome Annotation and its Use in the Project to Annotate 1000 Genomes , 2005, Nucleic acids research.

[55]  Natalia Shulzhenko,et al.  Microarrays for cancer diagnosis and classification. , 2007, Advances in experimental medicine and biology.

[56]  Yuri Kotliarov,et al.  Construct and Compare Gene Coexpression Networks with DAPfinder and DAPview , 2011, BMC Bioinformatics.

[57]  S. Kasif,et al.  Whole-genome annotation by using evidence integration in functional-linkage networks. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Ziv Bar-Joseph,et al.  A Semi-Supervised Method for Predicting Transcription Factor–Gene Interactions in Escherichia coli , 2008, PLoS Comput. Biol..

[59]  Stephen A. Ramsey,et al.  Epigenome-Guided Analysis of the Transcriptome of Plaque Macrophages during Atherosclerosis Regression Reveals Activation of the Wnt Signaling Pathway , 2014, PLoS genetics.

[60]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[61]  Chris Sander,et al.  Characterizing gene sets with FuncAssociate , 2003, Bioinform..

[62]  Richard Bonneau,et al.  A Validated Regulatory Network for Th17 Cell Specification , 2012, Cell.

[63]  Sergio Contrino,et al.  ArrayExpress—a public repository for microarray gene expression data at the EBI , 2004, Nucleic Acids Res..

[64]  Yanda Li,et al.  Inferring pathway crosstalk networks using gene set co-expression signatures. , 2013, Molecular bioSystems.

[65]  Albert Y. Zomaya,et al.  Assortative mixing in directed biological networks , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[66]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[67]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[68]  Igor Jurisica,et al.  Protein complex prediction via cost-based clustering , 2004, Bioinform..

[69]  Michael J. E. Sternberg,et al.  PINALOG: a novel approach to align protein interaction networks—implications for complex detection and function prediction , 2012, Bioinform..

[70]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[71]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.

[72]  D. Pe’er,et al.  An Integrated Approach to Uncover Drivers of Cancer , 2010, Cell.

[73]  Sheila M. Reynolds,et al.  Integrated analyses identify a master microRNA regulatory network for the mesenchymal subtype in serous ovarian cancer. , 2013, Cancer cell.

[74]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[75]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[76]  Simon Tavaré,et al.  Normalization of metabolomics data with applications to correlation maps , 2014, Bioinform..

[77]  Andrea Califano,et al.  hARACNe: improving the accuracy of regulatory model reverse engineering via higher-order data processing inequality tests , 2013, Interface Focus.

[78]  Richard M. Karp,et al.  Algorithms to Detect Multiprotein Modularity Conserved during Evolution , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[79]  V. Fossaluza,et al.  Building complex networks through classical and Bayesian statistics - A comparison , 2012, 1409.2833.

[80]  Yingdong Zhao,et al.  Analysis of Gene Expression Data Using BRB-Array Tools , 2007, Cancer informatics.

[81]  Trey Ideker,et al.  NeXO Web: the NeXO ontology database and visualization platform , 2013, Nucleic Acids Res..

[82]  S. Horvath,et al.  Variations in DNA elucidate molecular networks that cause disease , 2008, Nature.

[83]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[84]  Sanjit K. Mitra,et al.  Optimized LOWESS normalization parameter selection for DNA microarray data , 2004, BMC Bioinformatics.

[85]  Peter Spirtes,et al.  Introduction to Causal Inference , 2010, J. Mach. Learn. Res..

[86]  J. Ravel,et al.  Uncovering effects of antibiotics on the host and microbiota using transkingdom gene networks , 2015, Gut.

[87]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[88]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[89]  Roded Sharan,et al.  NetworkBLAST: comparative analysis of protein networks , 2008 .

[90]  Richard M. Karp,et al.  Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm , 2007, J. Comput. Biol..

[91]  Qibin Zhang,et al.  Temporal Proteome and Lipidome Profiles Reveal Hepatitis C Virus-Associated Reprogramming of Hepatocellular Metabolism and Bioenergetics , 2010, PLoS pathogens.

[92]  Andrey Morgun,et al.  Gene network reconstruction reveals cell cycle and antiviral genes as major drivers of cervical cancer , 2013, Nature Communications.

[93]  Susan P. Holmes,et al.  Waste Not , Want Not : Why Rarefying Microbiome Data is Inadmissible . October 1 , 2013 , 2013 .

[94]  Ting Chen,et al.  Diffusion kernel-based logistic regression models for protein function prediction. , 2006, Omics : a journal of integrative biology.

[95]  A. Morgun,et al.  The Th1 /Th2 immune-type response of the recurrent aphthous ulceration analyzed by cDNA microarray. , 2004, Journal of oral pathology & medicine : official publication of the International Association of Oral Pathologists and the American Academy of Oral Pathology.

[96]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[97]  Mariano J. Alvarez,et al.  Identification of Causal Genetic Drivers of Human Disease through Systems-Level Analysis of Regulatory Networks , 2014, Cell.

[98]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[99]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[100]  Stephen A. Ramsey,et al.  Genome-wide histone acetylation data improve prediction of mammalian transcription factor binding sites , 2010, Bioinform..

[101]  C. Burge,et al.  Prediction of Mammalian MicroRNA Targets , 2003, Cell.

[102]  A. Barabasi,et al.  Network link prediction by global silencing of indirect correlations , 2013, Nature Biotechnology.

[103]  Markus Müller,et al.  Automated protein identification by tandem mass spectrometry: issues and strategies. , 2006, Mass spectrometry reviews.

[104]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[105]  Martin G. Everett,et al.  Network analysis of 2-mode data , 1997 .

[106]  Yingdong Zhao,et al.  BRB-ArrayTools Data Archive for Human Cancer Gene Expression: A Unique and Efficient Data Sharing Resource , 2008, Cancer informatics.

[107]  Gary D Bader,et al.  Enrichment Map: A Network-Based Method for Gene-Set Enrichment Visualization and Interpretation , 2010, PloS one.

[108]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[109]  Ying Zhang,et al.  HMDB: the Human Metabolome Database , 2007, Nucleic Acids Res..

[110]  Lennart Martens,et al.  PRIDE: The proteomics identifications database , 2005, Proteomics.

[111]  Fidel Ramírez,et al.  Computing topological parameters of biological networks , 2008, Bioinform..

[112]  J. Ross,et al.  MIDER: Network Inference with Mutual Information Distance and Entropy Reduction , 2014, PloS one.

[113]  Jae K. Lee,et al.  Local-pooled-error test for identifying differentially expressed genes with a small number of replicated microarrays , 2003, Bioinform..

[114]  Andrey Morgun,et al.  Unexpected links reflect the noise in networks , 2013 .

[115]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[116]  Zachary D. Smith,et al.  Unbiased Reconstruction of a Mammalian Transcriptional Network Mediating Pathogen Responses , 2009 .

[117]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[118]  J. Collado-Vides,et al.  Method DISTILLER : a data integration framework to reveal condition dependency of complex regulons in Escherichia coli , 2009 .

[119]  Kathleen Marchal,et al.  Comparative analysis of module-based versus direct methods for reverse-engineering transcriptional regulatory networks , 2009, BMC Systems Biology.

[120]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[121]  Muriel Médard,et al.  Network deconvolution as a general method to distinguish direct dependencies in networks , 2013, Nature Biotechnology.

[122]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[123]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[124]  Chunquan Li,et al.  SubpathwayMiner: a software package for flexible identification of pathways , 2009, Nucleic acids research.