GSGS: A Computational Approach to Reconstruct Signaling Pathway Structures from Gene Sets

Reconstruction of signaling pathway structures is essential to decipher complex regulatory relationships in living cells. The existing computational approaches often rely on unrealistic biological assumptions and do not explicitly consider signal transduction mechanisms. Signal transduction events refer to linear cascades of reactions from the cell surface to the nucleus and characterize a signaling pathway. In this paper, we propose a novel approach, Gene Set Gibbs Sampling (GSGS), to reverse engineer signaling pathway structures from gene sets related to the pathways. We hypothesize that signaling pathways are structurally an ensemble of overlapping linear signal transduction events which we encode as Information Flows (IFs). We infer signaling pathway structures from gene sets, referred to as Information Flow Gene Sets (IFGSs), corresponding to these events. Thus, an IFGS only reflects which genes appear in the underlying IF but not their ordering. GSGS offers a Gibbs sampling like procedure to reconstruct the underlying signaling pathway structure by sequentially inferring IFs from the overlapping IFGSs related to the pathway. In the proof-of-concept studies, our approach is shown to outperform the existing state-of-the-art network inference approaches using both continuous and discrete data generated from benchmark networks in the DREAM initiative. We perform a comprehensive sensitivity analysis to assess the robustness of our approach. Finally, we implement GSGS to reconstruct signaling mechanisms in breast cancer cells.

[1]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[2]  R. Tagliaferri,et al.  Discovery of drug mode of action and drug repositioning from transcriptional responses , 2010, Proceedings of the National Academy of Sciences.

[3]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[4]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[5]  Xinghua Lu,et al.  Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph , 2010, Bioinform..

[6]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[7]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[8]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[10]  Pedro Mendes Framework for Comparative Assessment of Parameter Estimation and Inference Methods in Systems Biology , 2010, Learning and Inference in Computational Systems Biology.

[11]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[12]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[13]  J. Hasty,et al.  Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[14]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[15]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[16]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[17]  Olga G. Troyanskaya,et al.  Nested effects models for high-dimensional phenotyping screens , 2007, ISMB/ECCB.

[18]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[19]  H. Lenz,et al.  EGFR Signaling and Drug Discovery , 2010, Oncology.

[20]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[21]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[22]  Edward R. Dougherty,et al.  Steady-State Analysis of Genetic Regulatory Networks Modelled by Probabilistic Boolean Networks , 2003, Comparative and functional genomics.

[23]  Monilola A. Olayioye,et al.  Update on HER-2 as a target for cancer therapy: Intracellular signaling pathways of ErbB2/HER-2 and family members , 2001, Breast Cancer Research.

[24]  Colin Rose Computational Statistics , 2011, International Encyclopedia of Statistical Science.

[25]  H Kishino,et al.  Correspondence analysis of genes and tissue types and finding genetic links from microarray data. , 2000, Genome informatics. Workshop on Genome Informatics.

[26]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[27]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[28]  Sally L. Wood,et al.  Understanding the topology of a telephone network via internally-sensed network tomography , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[29]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[30]  Alfred O. Hero,et al.  High Throughput Screening of Co-Expressed Gene Pairs with Controlled False Discovery Rate (FDR) and Minimum Acceptable Strength (MAS) , 2005, J. Comput. Biol..

[31]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[32]  Volume Assp,et al.  ACOUSTICS. SPEECH. AND SIGNAL PROCESSING , 1983 .

[33]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[34]  David A. Cohn,et al.  cGraph: A Fast Graph-Based Method for Link Analysis and Queries , 2004 .

[35]  Robert D. Nowak,et al.  Network Inference From Co-Occurrences , 2006, IEEE Transactions on Information Theory.

[36]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[37]  J. McCubrey,et al.  EGFR family signaling and its association with breast cancer development and resistance to chemotherapy (Review). , 2003, International journal of oncology.

[38]  Isaac S. Kohane,et al.  Relevance Networks: A First Step Toward Finding Genetic Regulatory Networks Within Microarray Data , 2003 .

[39]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[40]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[41]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[42]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[43]  Hongyu Zhao,et al.  Building pathway clusters from Random Forests classification using class votes , 2008, BMC Bioinformatics.

[44]  M. Girolami,et al.  Inferring Signaling Pathway Topologies from Multiple Perturbation Measurements of Specific Biochemical Species , 2010, Science Signaling.

[45]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[46]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..