Optimal structural inference of signaling pathways from unordered and overlapping gene sets

MOTIVATION A plethora of bioinformatics analysis has led to the discovery of numerous gene sets, which can be interpreted as discrete measurements emitted from latent signaling pathways. Their potential to infer signaling pathway structures, however, has not been sufficiently exploited. Existing methods accommodating discrete data do not explicitly consider signal cascading mechanisms that characterize a signaling pathway. Novel computational methods are thus needed to fully utilize gene sets and broaden the scope from focusing only on pairwise interactions to the more general cascading events in the inference of signaling pathway structures. RESULTS We propose a gene set based simulated annealing (SA) algorithm for the reconstruction of signaling pathway structures. A signaling pathway structure is a directed graph containing up to a few hundred nodes and many overlapping signal cascades, where each cascade represents a chain of molecular interactions from the cell surface to the nucleus. Gene sets in our context refer to discrete sets of genes participating in signal cascades, the basic building blocks of a signaling pathway, with no prior information about gene orderings in the cascades. From a compendium of gene sets related to a pathway, SA aims to search for signal cascades that characterize the optimal signaling pathway structure. In the search process, the extent of overlap among signal cascades is used to measure the optimality of a structure. Throughout, we treat gene sets as random samples from a first-order Markov chain model. We evaluated the performance of SA in three case studies. In the first study conducted on 83 KEGG pathways, SA demonstrated a significantly better performance than Bayesian network methods. Since both SA and Bayesian network methods accommodate discrete data, use a 'search and score' network learning strategy and output a directed network, they can be compared in terms of performance and computational time. In the second study, we compared SA and Bayesian network methods using four benchmark datasets from DREAM. In our final study, we showcased two context-specific signaling pathways activated in breast cancer. AVAILABILITY Source codes are available from http://dl.dropbox.com/u/16000775/sa_sc.zip.

[1]  Edwin K. P. Chong,et al.  An Introduction to Optimization: Chong/An Introduction , 2008 .

[2]  Jiang Shou,et al.  Cross-Talk between Estrogen Receptor and Growth Factor Pathways as a Molecular Target for Overcoming Endocrine Resistance , 2004, Clinical Cancer Research.

[3]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[4]  Alfonso Valencia,et al.  TopoGSA: network topological gene set analysis , 2010, Bioinform..

[5]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[6]  M. Girolami,et al.  Inferring Signaling Pathway Topologies from Multiple Perturbation Measurements of Specific Biochemical Species , 2010, Science Signaling.

[7]  S. Johnston,et al.  Clinical trials update: endocrine and biological therapy combinations in the treatment of breast cancer , 2007, Breast Cancer Research.

[8]  R. Roskoski,et al.  RAF protein-serine/threonine kinases: structure and regulation. , 2010, Biochemical and biophysical research communications.

[9]  E. Baldi,et al.  Nongenomic activation of spermatozoa by steroid hormones: Facts and fictions , 2009, Molecular and Cellular Endocrinology.

[10]  Prospero C. Naval,et al.  Parameter estimation using Simulated Annealing for S-system models of biochemical networks , 2007, Bioinform..

[11]  Olga G. Troyanskaya,et al.  Simultaneous Genome-Wide Inference of Physical, Genetic, Regulatory, and Functional Pathway Components , 2010, PLoS Comput. Biol..

[12]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[13]  J. Davies,et al.  Molecular Biology of the Cell , 1983, Bristol Medico-Chirurgical Journal.

[14]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Bruce E. Hajek,et al.  Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[16]  Kevin Kontos,et al.  Information-Theoretic Inference of Large Transcriptional Regulatory Networks , 2007, EURASIP J. Bioinform. Syst. Biol..

[17]  Larry Goldstein,et al.  Neighborhood size in the Simulated Annealing Algorithm , 1988 .

[18]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[19]  G. Altay,et al.  Structural influence of gene networks on their inference: analysis of C3NET. , 2011 .

[20]  J. McCubrey,et al.  EGFR family signaling and its association with breast cancer development and resistance to chemotherapy (Review). , 2003, International journal of oncology.

[21]  N. D. Clarke,et al.  Correction: Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PLoS ONE.

[22]  Frank Emmert-Streib,et al.  Inferring the conservative causal core of gene regulatory networks , 2010, BMC Systems Biology.

[23]  Kevin Murphy,et al.  Active Learning of Causal Bayes Net Structure , 2006 .

[24]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[25]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[26]  Grace S. Shieh,et al.  Inferring genetic interactions via a nonlinear model and an optimization algorithm , 2010, BMC Systems Biology.

[27]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[28]  R. Schiff,et al.  Gefitinib or Placebo in Combination with Tamoxifen in Patients with Hormone Receptor–Positive Metastatic Breast Cancer: A Randomized Phase II Study , 2011, Clinical Cancer Research.

[29]  Nigel Chaffey,et al.  Alberts, B., Johnson, A., Lewis, J., Raff, M., Roberts, K. and Walter, P. Molecular biology of the cell. 4th edn. , 2003 .

[30]  D. Floreano,et al.  Revealing strengths and weaknesses of methods for gene network inference , 2010, Proceedings of the National Academy of Sciences.

[31]  Susumu Goto,et al.  KEGG for representation and analysis of molecular networks involving diseases and drugs , 2009, Nucleic Acids Res..

[32]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Lars Kaderali,et al.  Reconstructing signaling pathways from RNAi data using probabilistic Boolean threshold networks , 2009, Bioinform..

[34]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[35]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[36]  S. D. Gross,et al.  Activation of the anaphase-promoting complex and degradation of cyclin B is not required for progression from Meiosis I to II in Xenopus oocytes , 2001, Current Biology.

[37]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[38]  Fred Glover,et al.  Tabu Search - Part II , 1989, INFORMS J. Comput..

[39]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[40]  H. Lenz,et al.  EGFR Signaling and Drug Discovery , 2010, Oncology.

[41]  Michael Rabbat,et al.  GSGS: A Computational Framework to Reconstruct Signaling Pathways from Gene Sets , 2011, 1101.3983.

[42]  Dario Floreano,et al.  Generating Realistic In Silico Gene Networks for Performance Assessment of Reverse Engineering Methods , 2009, J. Comput. Biol..

[43]  Robert D. Nowak,et al.  Network Inference From Co-Occurrences , 2006, IEEE Transactions on Information Theory.

[44]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[45]  Frank Emmert-Streib,et al.  Revealing differences in gene network inference algorithms on the network level by ensemble methods , 2010, Bioinform..

[46]  Sarat Chandarlapaty,et al.  AKT inhibition relieves feedback suppression of receptor tyrosine kinase expression and activity. , 2011, Cancer cell.

[47]  M. Dowsett,et al.  Lapatinib Restores Hormone Sensitivity with Differential Effects on Estrogen Receptor Signaling in Cell Models of Human Epidermal Growth Factor Receptor 2–Negative Breast Cancer with Acquired Endocrine Resistance , 2010, Clinical Cancer Research.

[48]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[49]  Fred W. Glover,et al.  Tabu Search - Part I , 1989, INFORMS J. Comput..

[50]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[51]  Joaquín Dopazo,et al.  Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies , 2009, Nucleic Acids Res..

[52]  B. Alberts,et al.  Molecular Biology of the Cell 4th edition , 2007 .

[53]  Daniel Barker,et al.  LVB: parsimony and simulated annealing in the search for phylogenetic trees , 2000, Bioinform..