Integrated inference and analysis of regulatory networks from multi-level measurements.

Regulatory and signaling networks coordinate the enormously complex interactions and processes that control cellular processes (such as metabolism and cell division), coordinate response to the environment, and carry out multiple cell decisions (such as development and quorum sensing). Regulatory network inference is the process of inferring these networks, traditionally from microarray data but increasingly incorporating other measurement types such as proteomics, ChIP-seq, metabolomics, and mass cytometry. We discuss existing techniques for network inference. We review in detail our pipeline, which consists of an initial biclustering step, designed to estimate co-regulated groups; a network inference step, designed to select and parameterize likely regulatory models for the control of the co-regulated groups from the biclustering step; and a visualization and analysis step, designed to find and communicate key features of the network. Learning biological networks from even the most complete data sets is challenging; we argue that integrating new data types into the inference pipeline produces networks of increased accuracy, validity, and biological relevance.

[1]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[2]  Qing Nie,et al.  Incorporating Existing Network Information into Gene Network Inference , 2009, PloS one.

[3]  Chaoyang Zhang,et al.  Time lagged information theoretic approaches to the reverse engineering of gene regulatory networks , 2010, BMC Bioinformatics.

[4]  M. Mann,et al.  Stable Isotope Labeling by Amino Acids in Cell Culture, SILAC, as a Simple and Accurate Approach to Expression Proteomics* , 2002, Molecular & Cellular Proteomics.

[5]  Adam J. Smith,et al.  The Database of Interacting Proteins: 2004 update , 2004, Nucleic Acids Res..

[6]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[7]  N. Barkai,et al.  Comparative analysis indicates regulatory neofunctionalization of yeast duplicates , 2007, Genome Biology.

[8]  Olivier Elemento,et al.  Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach , 2005, Genome Biology.

[9]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[10]  Christophe Benoist,et al.  Transcriptomes of the B and T Lineages Compared by Multiplatform Microarray Profiling , 2011, The Journal of Immunology.

[11]  T. Jaakkola,et al.  Bayesian Network Approach to Cell Signaling Pathway Modeling , 2002, Science's STKE.

[12]  Sylvie Ricard-Blum,et al.  MatrixDB, the extracellular matrix interaction database , 2010, Nucleic Acids Res..

[13]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[14]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[15]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): mouse biology and model systems , 2007, Nucleic Acids Res..

[16]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[17]  J. Collins,et al.  Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks , 2005, Nature Biotechnology.

[18]  Carlos Prieto,et al.  APID: Agile Protein Interaction DataAnalyzer , 2006, Nucleic Acids Res..

[19]  Gregory Stephanopoulos,et al.  Elucidation of gene interaction networks through time-lagged correlation analysis of transcriptional data. , 2004, Genome research.

[20]  B. Snel,et al.  STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. , 2000, Nucleic acids research.

[21]  N. D. Clarke,et al.  Integration of External Signaling Pathways with the Core Transcriptional Network in Embryonic Stem Cells , 2008, Cell.

[22]  D. Floreano,et al.  Replaying the Evolutionary Tape: Biomimetic Reverse Engineering of Gene Networks , 2009, Annals of the New York Academy of Sciences.

[23]  Olga G. Troyanskaya,et al.  Detailing regulatory networks through large scale data integration , 2009, Bioinform..

[24]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[25]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem. , 2003 .

[26]  Dario Floreano,et al.  Combining Multiple Results of a Reverse‐Engineering Algorithm: Application to the DREAM Five‐Gene Network Challenge , 2009, Annals of the New York Academy of Sciences.

[27]  Richard Bonneau,et al.  DREAM4: Combining Genetic and Dynamic Information to Identify Biological Networks and Dynamical Models , 2010, PloS one.

[28]  Christodoulos A. Floudas,et al.  Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies , 2008, BMC Bioinformatics.

[29]  A. G. de la Fuente,et al.  From Knockouts to Networks: Establishing Direct Cause-Effect Relationships through Graph Analysis , 2010, PloS one.

[30]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[31]  Carter T. Butts,et al.  Social Network Analysis with sna , 2008 .

[32]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[33]  Steven P Gygi,et al.  Signaling networks assembled by oncogenic EGFR and c-Met , 2008, Proceedings of the National Academy of Sciences.

[34]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[35]  Hong Yan,et al.  Discovering biclusters in gene expression data based on high-dimensional linear geometries , 2008, BMC Bioinformatics.

[36]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[37]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[38]  Richard Bonneau,et al.  The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo , 2006, Genome Biology.

[39]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[40]  Amy K. Schmid,et al.  The Firegoose: two-way integration of diverse data from different bioinformatics web resources with desktop applications , 2007, BMC Bioinformatics.

[41]  Matteo Pellegrini,et al.  Prolinks: a database of protein functional linkages derived from coevolution , 2004, Genome Biology.

[42]  Jacques van Helden,et al.  Regulatory Sequence Analysis Tools , 2003, Nucleic Acids Res..

[43]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[44]  Michael Gribskov,et al.  Combining evidence using p-values: application to sequence homology searches , 1998, Bioinform..

[45]  A. Califano,et al.  Dialogue on Reverse‐Engineering Assessment and Methods , 2007, Annals of the New York Academy of Sciences.

[46]  Andreas Zell,et al.  EDISA: extracting biclusters from multiple time-series of gene expression profiles , 2007, BMC Bioinformatics.

[47]  David J. Reiss,et al.  The Gaggle: An open-source software system for integrating bioinformatics software and data sources , 2006, BMC Bioinformatics.

[48]  S. Bergmann,et al.  Comparative Gene Expression Analysis by a Differential Clustering Approach: Application to the Candida albicans Transcription Program , 2005, PLoS genetics.

[49]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[50]  Xiaodong Wang,et al.  Gene Regulatory Network Reconstruction Using Conditional Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[51]  Roded Sharan,et al.  Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[52]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[53]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[54]  Gary D Bader,et al.  BIND--The Biomolecular Interaction Network Database. , 2001, Nucleic acids research.

[55]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[56]  N. D. Clarke,et al.  Correction: Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PLoS ONE.

[57]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[58]  Riet De Smet,et al.  Advantages and limitations of current network inference methods , 2010, Nature Reviews Microbiology.

[59]  Dennis Shasha,et al.  Sungear: interactive visualization and functional analysis of genomic datasets , 2007, Bioinform..

[60]  David J. Reiss,et al.  Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks , 2006, BMC Bioinformatics.

[61]  Richard Bonneau,et al.  DREAM3: Network Inference Using Dynamic Context Likelihood of Relatedness and the Inferelator , 2010, PloS one.

[62]  Kara Dolinski,et al.  The BioGRID Interaction Database: 2011 update , 2010, Nucleic Acids Res..

[63]  Bart De Moor,et al.  A Framework for Elucidating Regulatory Networks Based on Prior Information and Expression Data , 2007, Annals of the New York Academy of Sciences.

[64]  Michael Y. Galperin,et al.  The COG database: a tool for genome-scale analysis of protein functions and evolution , 2000, Nucleic Acids Res..

[65]  Gerhard Reinelt,et al.  Reconstructing nonlinear dynamic models of gene regulation using stochastic sampling , 2009, BMC Bioinformatics.

[66]  Matthew R. Laird,et al.  Protein Protein Interaction Network Evaluation for Identifying Potential Drug Targets , 2009 .

[67]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[68]  Jens Timmer,et al.  Reconstructing gene-regulatory networks from time series, knock-out data, and prior knowledge , 2007, BMC Systems Biology.

[69]  Martin S. Taylor,et al.  The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line , 2009, Nature Genetics.

[70]  Kevin Y. Yip,et al.  Improved Reconstruction of In Silico Gene Regulatory Networks by Integrating Knockout and Perturbation Data , 2010, PloS one.

[71]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[72]  Emily A. Smith,et al.  Surface Plasmon Resonance Imaging as a Tool to Monitor Biomolecular Interactions in an Array Based Format , 2003, Applied spectroscopy.

[73]  David A. Drubin,et al.  Learning a Prior on Regulatory Potential from eQTL Data , 2009, PLoS genetics.

[74]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[75]  Damian Szklarczyk,et al.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored , 2010, Nucleic Acids Res..

[76]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[77]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[78]  Sach Mukherjee,et al.  Network inference using informative priors , 2008, Proceedings of the National Academy of Sciences.

[79]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[80]  A. Barabasi,et al.  Hierarchical Organization of Modularity in Metabolic Networks , 2002, Science.

[81]  Amy K. Schmid,et al.  A Predictive Model for Transcriptional Control of Physiology in a Free Living Cell , 2007, Cell.

[82]  Martina Morris,et al.  A statnet Tutorial. , 2008, Journal of statistical software.

[83]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[84]  Richard Bonneau,et al.  Multi-species integrative biclustering , 2010, Genome Biology.

[85]  Charles DeLisi,et al.  Predictome: a database of putative functional links between proteins , 2002, Nucleic Acids Res..

[86]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[87]  Peter Uetz,et al.  MPIDB: the microbial protein interaction database , 2008, Bioinform..

[88]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[89]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[90]  Livia Perfetto,et al.  MINT, the molecular interaction database: 2009 update , 2009, Nucleic Acids Res..

[91]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[92]  Richard Bonneau,et al.  Comparative Microbial Modules Resource: Generation and Visualization of Multi-species Biclusters , 2011, PLoS Comput. Biol..

[93]  Ian M. Donaldson,et al.  iRefIndex: A consolidated protein interaction database with provenance , 2008, BMC Bioinformatics.

[94]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[95]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[96]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[97]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[98]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[99]  Ye Deng,et al.  Functional Molecular Ecological Networks , 2010, mBio.

[100]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[101]  Wei Keat Lim,et al.  The transcriptional network for mesenchymal transformation of brain tumors , 2009, Nature.

[102]  Henning Hermjakob,et al.  InteroPORC: automated inference of highly conserved protein interaction networks , 2008, Bioinform..

[103]  Ziv Bar-Joseph,et al.  Cross species analysis of microarray expression data , 2009, Bioinform..

[104]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[105]  Adam P Arkin,et al.  Modularity of stress response evolution , 2008, Proceedings of the National Academy of Sciences.

[106]  Dirk Husmeier,et al.  Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks with Bayesian networks. , 2007, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[107]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[108]  Mihalis Yannakakis,et al.  Node-Deletion Problems on Bipartite Graphs , 1981, SIAM J. Comput..

[109]  M. Olivé,et al.  Long-term human breast carcinoma cell lines of metastatic origin: Preliminary characterization , 1978, In Vitro.

[110]  Diego di Bernardo,et al.  Inference of gene regulatory networks and compound mode of action from time course gene expression profiles , 2006, Bioinform..