A dynamic programing approach to integrate gene expression data and network information for pathway model generation

MOTIVATION As large amounts of biological data continue to be rapidly generated, a major focus of bioinformatics research has been aimed towards integrating these data to identify active pathways or modules under certain experimental conditions or phenotypes. Although biologically significant modules can often be detected globally by many existing methods, it is often hard to interpret or make use of the results towards pathway model generation and testing. RESULTS To address this gap, we have developed the IMPRes algorithm, a new step-wise active pathway detection method using a dynamic programming approach. IMPRes takes advantage of the existing pathway interaction knowledge in KEGG. Omics data are then used to assign penalties to genes, interactions, and pathways. Finally, starting from one or multiple seed genes, a shortest path algorithm is applied to detect downstream pathways that best explain the gene expression data. Since dynamic programming enables the detection one step at a time, it is easy for researchers to trace the pathways, which may lead to more accurate drug design and more effective treatment strategies. The evaluation experiments conducted on three yeast data sets have shown that IMPRes can achieve competitive or better performance than other state-of-the-art methods. Furthermore, a case study on human lung cancer data set was performed and we provided several insights on genes and mechanisms involved in lung cancer, which had not been discovered before. AVAILABILITY IMPRes visualization tool is available via web server at http://digbio.missouri.edu/impres.

[1]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[2]  D. Karger,et al.  Bridging high-throughput genetic and transcriptional data reveals cellular responses to alpha-synuclein toxicity , 2009, Nature Genetics.

[3]  Tommi S. Jaakkola,et al.  Physical Network Models , 2004, J. Comput. Biol..

[4]  Juan Liu,et al.  Edge‐group sparse PCA for network‐guided high dimensional data analysis , 2018, Bioinform..

[5]  Shuigeng Zhou,et al.  NEpiC: a network-assisted algorithm for epigenetic studies using mean and variance combined signals , 2016, Nucleic acids research.

[6]  Yan Wang,et al.  Essential protein identification based on essential protein-protein interaction prediction by integrated edge weights , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  Chao Wu,et al.  Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes , 2012, BMC Bioinformatics.

[8]  Andrew E. Teschendorff,et al.  An integrative network algorithm identifies age-associated differential methylation interactome hotspots targeting stem-cell differentiation pathways , 2013, Scientific Reports.

[9]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[10]  Hongzhe Li,et al.  A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data , 2008, 0803.3942.

[11]  Teresa M. Przytycka,et al.  Identifying Causal Genes and Dysregulated Pathways in Complex Diseases , 2011, PLoS Comput. Biol..

[12]  Ernest Fraenkel,et al.  ResponseNet: revealing signaling and regulatory networks linking genetic and transcriptomic screening data , 2011, Nucleic Acids Res..

[13]  Sama Goliaei,et al.  HybridRanker: Integrating network topology and biomedical knowledge to prioritize cancer candidate genes , 2016, J. Biomed. Informatics.

[14]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[15]  Stefan Hohmann,et al.  Control of high osmolarity signalling in the yeast Saccharomyces cerevisiae , 2009, FEBS letters.

[16]  J. Arroyo,et al.  Genomics in the detection of damage in microbial systems: cell wall stress in yeast. , 2009, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[17]  Yuzhen Ye,et al.  A Parsimony Approach to Biological Pathway Reconstruction/Inference for Genomes and Metagenomes , 2009, PLoS Comput. Biol..

[18]  Shi-Hua Zhang,et al.  Detecting disease associated modules and prioritizing active genes based on high throughput data , 2010, BMC Bioinformatics.

[19]  D. Pe’er,et al.  An Integrated Approach to Uncover Drivers of Cancer , 2010, Cell.

[20]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[21]  Ana Conesa,et al.  Next maSigPro: updating maSigPro bioconductor package for RNA-seq time series , 2014, Bioinform..

[22]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[23]  Christian Borgs,et al.  Simultaneous Reconstruction of Multiple Signaling Pathways via the Prize-Collecting Steiner Forest Problem , 2012, J. Comput. Biol..

[24]  Ziv Bar-Joseph,et al.  Computational methods for analyzing dynamic regulatory networks. , 2010, Methods in molecular biology.

[25]  Javier Arroyo,et al.  The Global Transcriptional Response to Transient Cell Wall Damage in Saccharomyces cerevisiae and Its Regulation by the Cell Integrity Signaling Pathway* , 2004, Journal of Biological Chemistry.

[26]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[27]  J. Hoheisel,et al.  Genome-wide Analysis of the Response to Cell Wall Mutations in the Yeast Saccharomyces cerevisiae* , 2003, Journal of Biological Chemistry.

[28]  Alexey I. Nesvizhskii,et al.  Reconstructing targetable pathways in lung cancer by integrating diverse omics data , 2013, Nature Communications.

[29]  Megha Verma,et al.  RDF Sketch Maps - Knowledge Complexity Reduction for Precision Medicine Analytics , 2016, PSB.

[30]  Ernest Fraenkel,et al.  SteinerNet: a web server for integrating ‘omic’ data to discover hidden components of response pathways , 2012, Nucleic Acids Res..

[31]  Benno Schwikowski,et al.  Discovering regulatory and signalling circuits in molecular interaction networks , 2002, ISMB.

[32]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[33]  Joaquín Moreno,et al.  Specific and global regulation of mRNA stability during osmotic stress in Saccharomyces cerevisiae. , 2009, RNA.

[34]  Z. Bar-Joseph,et al.  Linking the signaling cascades and dynamic regulatory networks controlling stress responses , 2013, Genome research.

[35]  Francesco C Stingo,et al.  INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES. , 2011, The annals of applied statistics.

[36]  Christian Borgs,et al.  Finding undetected protein associations in cell signaling by belief propagation , 2010, Proceedings of the National Academy of Sciences.