R-Based Software for the Integration of Pathway Data into Bioinformatic Algorithms

Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review, we examine software packages for the statistical computing framework R, which enable the integration of pathway data for further bioinformatic analyses. Different approaches to integrate and visualize pathway data are identified and packages are stratified concerning their features according to a number of different aspects: data import strategies, the extent of available data, dependencies on external tools, integration with further analysis steps and visualization options are considered. A total of 12 packages integrating pathway data are reviewed in this manuscript. These are supplemented by five R-specific packages for visualization and six connector packages, which provide access to external tools.

[1]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[2]  Lincoln Stein,et al.  Reactome: a database of reactions, pathways and biological processes , 2010, Nucleic Acids Res..

[3]  Avi Ma'ayan,et al.  Sig2BioPAX: Java tool for converting flat files to BioPAX Level 3 format , 2011, Source Code for Biology and Medicine.

[4]  P. Park,et al.  Discovering statistically significant pathways in expression profiling studies. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  G. Smyth,et al.  Camera: a competitive gene set test accounting for inter-gene correlation , 2012, Nucleic acids research.

[6]  Chris Sander,et al.  Pathway information for systems biology , 2005, FEBS letters.

[7]  Jaques Reifman,et al.  PathNet: a tool for pathway analysis using topological information , 2012, Source Code for Biology and Medicine.

[8]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[9]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[10]  Nicola J. Mulder,et al.  From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems , 2011, Bioinform..

[11]  Stefan Wiemann,et al.  KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor , 2009, Bioinform..

[12]  David J. Reiss,et al.  The Gaggle: An open-source software system for integrating bioinformatics software and data sources , 2006, BMC Bioinformatics.

[13]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[14]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[15]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[16]  Gary D. Bader,et al.  Pathguide: a Pathway Resource List , 2005, Nucleic Acids Res..

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  Gary D Bader,et al.  PSICQUIC and PSISCORE: accessing and scoring molecular interactions , 2011, Nature Methods.

[19]  Michael Hucka,et al.  LibSBML: an API Library for SBML , 2008, Bioinform..

[20]  Andreas Tauch,et al.  Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks , 2008, Briefings Bioinform..

[21]  Chris T. A. Evelo,et al.  Presenting and exploring biological pathways with PathVisio , 2008, BMC Bioinformatics.

[22]  Karline Soetaert,et al.  Solving Differential Equations in R: Package deSolve , 2010 .

[23]  Andreas Dräger,et al.  GRN2SBML: automated encoding and annotation of inferred gene regulatory networks complying with SBML , 2013, Bioinform..

[24]  Matthew Suderman,et al.  Tools for visually exploring biological networks , 2007, Bioinform..

[25]  Emden R. Gansner,et al.  Graphviz - Open Source Graph Drawing Tools , 2001, GD.

[26]  Tomas Radivoyevitch,et al.  A two-way interface between limited Systems Biology Markup Language and R , 2004, BMC Bioinformatics.

[27]  Xin Wang,et al.  Bioinformatics Applications Note Systems Biology Htsanalyzer: an R/bioconductor Package for Integrated Network Analysis of High-throughput Screens , 2022 .

[28]  Patrick Lambrix,et al.  Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX , 2005, Bioinform..

[29]  Karline Soetaert,et al.  Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME , 2010 .

[30]  Holger Fröhlich,et al.  Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients , 2010, Bioinform..

[31]  Kurt Hornik,et al.  The Comprehensive R Archive Network , 2012 .

[32]  J C Schaff,et al.  Integrating BioPAX pathway knowledge with SBML models. , 2009, IET systems biology.

[33]  David J. Galas,et al.  RCytoscape: tools for exploratory network analysis , 2013, BMC Bioinformatics.

[34]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[35]  A. Bauer-Mehren,et al.  Pathway databases and tools for their exploitation: benefits, current limitations and challenges , 2009, Molecular systems biology.

[36]  Weijun Luo,et al.  Pathview: an R/Bioconductor package for pathway-based data integration and visualization , 2013, Bioinform..

[37]  F. Markowetz,et al.  RedeR: R/Bioconductor package for representing modular structures, nested networks and multiple levels of hierarchical associations , 2012, Genome Biology.

[38]  Duncan Temple Lang The Omegahat Environment: New Possibilities for Statistical Computing , 2000 .

[39]  Tim Beißbarth,et al.  Graph based fusion of miRNA and mRNA expression data improves clinical outcome prediction in prostate cancer , 2011, BMC Bioinformatics.

[40]  Holger Fröhlich,et al.  Estimating large-scale signaling networks through nested effect models with intervention effects from microarray data , 2008, Bioinform..

[41]  C. Sander,et al.  The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data , 2004, Nature Biotechnology.

[42]  Gabriele Sales,et al.  graphite - a Bioconductor package to convert pathway topology to gene network , 2012, BMC Bioinformatics.

[43]  Paolo G. V. Martini,et al.  Graphite Web: web tool for gene set analysis exploiting pathway topology , 2013, Nucleic Acids Res..

[44]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[45]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[46]  Gary D Bader,et al.  BioPAX – A community standard for pathway data sharing , 2010, Nature Biotechnology.

[47]  Robert Gentleman,et al.  Network structures and algorithms in Bioconductor , 2005, Bioinform..

[48]  Chris T. A. Evelo,et al.  WikiPathways: building research communities on biological pathways , 2011, Nucleic Acids Res..

[49]  Andreas Zell,et al.  Precise generation of systems biology models from KEGG pathways , 2013, BMC Systems Biology.

[50]  Gary D. Bader,et al.  Cytoscape App Store , 2013, Bioinform..

[51]  Johannes Goll,et al.  A new reference implementation of the PSICQUIC web service , 2013, Nucleic Acids Res..

[52]  Hans A. Kestler,et al.  BoolNet - an R package for generation, reconstruction and analysis of Boolean networks , 2010, Bioinform..

[53]  A. O. Chiromatzo,et al.  miRNApath: a database of miRNAs, target genes and metabolic pathways. , 2007, Genetics and molecular research : GMR.

[54]  Jin Wang,et al.  CePa: an R package for finding significant pathways weighted by multiple network centralities , 2013, Bioinform..

[55]  Stefanie Widder,et al.  The SBML ODE Solver Library: a native API for symbolic and fast numerical analysis of reaction networks , 2006, Bioinform..

[56]  Ron Shamir,et al.  SPIKE: a database of highly curated human signaling pathways , 2010, Nucleic Acids Res..

[57]  Tim Beißbarth,et al.  rBiopaxParser - an R package to parse, modify and visualize BioPAX data , 2013, Bioinform..

[58]  Andreas Zell,et al.  Qualitative translation of relations from BioPAX to SBML qual , 2012, Bioinform..

[59]  K. Shadan,et al.  Available online: , 2012 .

[60]  Nicolas Le Novère,et al.  Supporting SBML as a model exchange format in software applications. , 2013, Methods in molecular biology.

[61]  Monica Chiogna,et al.  Along signal paths: an empirical gene set approach exploiting pathway topology , 2012, Nucleic acids research.

[62]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[63]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[64]  Hiroaki Kitano,et al.  CellDesigner: a process diagram editor for gene-regulatory and biochemical networks , 2003 .

[65]  Holger Fröhlich,et al.  Dynamic deterministic effects propagation networks: learning signalling pathways from longitudinal protein array data , 2010, Bioinform..

[66]  Sandrine Dudoit,et al.  More power via graph-structured tests for differential expression of gene networks , 2012, 1206.6980.

[67]  Holger Fröhlich,et al.  Analyzing gene perturbation screens with nested effects models in R and bioconductor , 2008, Bioinform..

[68]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[69]  T. Beißbarth,et al.  Interpreting experimental results using gene ontologies. , 2006, Methods in enzymology.

[70]  Holger Fröhlich,et al.  Joint Bayesian inference of condition-specific miRNA and transcription factor activities from combined gene and microRNA expression data , 2012, Bioinform..