GSGS: A Computational Framework to Reconstruct Signaling Pathways from Gene Sets

We propose a novel two-stage Gene Set Gibbs Sampling (GSGS) framework, to reverse engineer signaling pathways from gene sets inferred from molecular profiling data. We hypothesize that signaling pathways are structurally an ensemble of overlapping linear signal transduction events which we encode as Information Flow Gene Sets (IFGS's). We infer pathways from gene sets corresponding to these events subjected to a random permutation of genes within each set. In Stage I, we use a source separation algorithm to derive unordered and overlapping IFGS's from molecular profiling data, allowing cross talk among IFGS's. In Stage II, we develop a Gibbs sampling like algorithm, Gene Set Gibbs Sampler, to reconstruct signaling pathways from the latent IFGS's derived in Stage I. The novelty of this framework lies in the seamless integration of the two stages and the hypothesis of IFGS's as the basic building blocks for signal pathways. In the proof-of-concept studies, our approach is shown to outperform the existing Bayesian network approaches using both continuous and discrete data generated from benchmark networks in the DREAM initiative. We perform a comprehensive sensitivity analysis to assess the robustness of the approach. Finally, we implement the GSGS framework to reconstruct signaling pathways in breast cancer cells.

[1]  S. Kudoh,et al.  Smads, Tak1, and Their Common Target Atf-2 Play a Critical Role in Cardiomyocyte Differentiation , 2001, The Journal of cell biology.

[2]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[3]  Andrew W. Moore,et al.  Finding Underlying Connections: A Fast Graph-Based Method for Link Analysis and Collaboration Queries , 2003, ICML.

[4]  T. Mcclanahan,et al.  Involvement of chemokine receptors in breast cancer metastasis , 2001, Nature.

[5]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[6]  Le Song,et al.  Time-Varying Dynamic Bayesian Networks , 2009, NIPS.

[7]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[8]  Sally L. Wood,et al.  Understanding the topology of a telephone network via internally-sensed network tomography , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Tsviya Olender,et al.  GeneCards Version 3: the human gene integrator , 2010, Database J. Biol. Databases Curation.

[10]  S. Horvath,et al.  Evidence for anti-Burkitt tumour globulins in Burkitt tumour patients and healthy individuals. , 1967, British Journal of Cancer.

[11]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[12]  Pedro Mendes Framework for Comparative Assessment of Parameter Estimation and Inference Methods in Systems Biology , 2010, Learning and Inference in Computational Systems Biology.

[13]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[14]  J. Hasty,et al.  Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[16]  Hongyu Zhao,et al.  Building pathway clusters from Random Forests classification using class votes , 2008, BMC Bioinformatics.

[17]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[18]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[19]  H Kishino,et al.  Correspondence analysis of genes and tissue types and finding genetic links from microarray data. , 2000, Genome informatics. Workshop on Genome Informatics.

[20]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[21]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[22]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[23]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[24]  Gustavo Stolovitzky,et al.  Lessons from the DREAM2 Challenges , 2009, Annals of the New York Academy of Sciences.

[25]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[26]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[27]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[28]  Xinghua Lu,et al.  Assessing the functional coherence of gene sets with metrics based on the Gene Ontology graph , 2010, Bioinform..

[29]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[30]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[31]  Isaac S. Kohane,et al.  Relevance Networks: A First Step Toward Finding Genetic Regulatory Networks Within Microarray Data , 2003 .

[32]  Philip M. Kim,et al.  Subsystem identification through dimensionality reduction of large-scale gene expression data. , 2003, Genome research.

[33]  Alfred O. Hero,et al.  High Throughput Screening of Co-Expressed Gene Pairs with Controlled False Discovery Rate (FDR) and Minimum Acceptable Strength (MAS) , 2005, J. Comput. Biol..

[34]  J. Collins,et al.  Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling , 2003, Science.

[35]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[37]  Robert D. Nowak,et al.  Network Inference From Co-Occurrences , 2006, IEEE Transactions on Information Theory.

[38]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.