ToPASeq: an R package for topology-based pathway analysis of microarray and RNA-Seq data

BackgroundPathway analysis methods, in which differentially expressed genes are mapped to databases of reference pathways and relative enrichment is assessed, help investigators to propose biologically relevant hypotheses. The last generation of pathway analysis methods takes into account the topological structure of a pathway, which helps to increase both specificity and sensitivity of the findings. Simultaneously, the RNA-Seq technology is gaining popularity and becomes widely used for gene expression profiling. Unfortunately, majority of topological pathway analysis methods remains without implementation and if an implementation exists, it is limited in various factors.ResultsWe developed a new R/Bioconductor package ToPASeq offering uniform interface to seven distinct topology-based pathway analysis methods, of which three we implemented de-novo and four were adjusted from existing implementations. Apart this, ToPASeq offers a set of tailored visualization functions and functions for importing and manipulating pathways and their topologies, facilitating the application of the methods on different species. The package can be used to compare the differential expression of pathways between two conditions on both gene expression microarray and RNA-Seq data. The package is written in R and is available from Bioconductor 3.2 using AGPL-3 license.ConclusionToPASeq is a novel package that offers seven distinct methods for topology-based pathway analysis, which are easily applicable on microarray as well as RNA-Seq data, both in human and other species. At the same time, it provides specific tools for visualization of the results.

[1]  C. Elsik The pea aphid genome sequence brings theories of insect defense into question , 2010, Genome Biology.

[2]  Gabriele Sales,et al.  graphite - a Bioconductor package to convert pathway topology to gene network , 2012, BMC Bioinformatics.

[3]  Monica Chiogna,et al.  Along signal paths: an empirical gene set approach exploiting pathway topology , 2012, Nucleic acids research.

[4]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[5]  Cristina Mitrea,et al.  Methods and approaches in the topology-based analysis of biological pathways , 2013, Front. Physiol..

[6]  Zhiping Weng,et al.  Identification of functional modules that correlate with phenotypic difference: the influence of network topology , 2010, Genome Biology.

[7]  Sabah Jassim,et al.  A MATLAB tool for pathway enrichment using a topology-based pathway regulation score , 2014, BMC Bioinformatics.

[8]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[9]  Laura L. Elo,et al.  Comparison of software packages for detecting differential expression in RNA-seq studies , 2013, Briefings Bioinform..

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[12]  Alessio Gizzi,et al.  Effects of Pacing Site and Stimulation History on Alternans Dynamics and the Development of Complex Spatiotemporal Patterns in Cardiac Tissue , 2013, Front. Physiol..

[13]  Sampo Niskanen,et al.  Cliquer user's guide, version 1.0 , 2003 .

[14]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[15]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[16]  E. Letouzé,et al.  Analysis of the copy number profiles of several tumor samples from the same patient reveals the successive steps in tumorigenesis , 2010, Genome Biology.

[17]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[18]  Sabah Jassim,et al.  A Topology-Based Score for Pathway Enrichment , 2012, J. Comput. Biol..

[19]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[20]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[21]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[22]  Sandrine Dudoit,et al.  More power via graph-structured tests for differential expression of gene networks , 2012, 1206.6980.

[23]  Peter J. Woolf,et al.  GAGE: generally applicable gene set enrichment for pathway analysis , 2009, BMC Bioinformatics.

[24]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[25]  S. Dudoit,et al.  Gains in Power from Structured Two-Sample Tests of Means on Graphs , 2010, 1009.5173.

[26]  Nitin Kumar,et al.  CDCOCA: a statistical method to define complexity dependent co-occurring chromosomal aberrations , 2010, Genome Biology.

[27]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[28]  Xujing Wang,et al.  TAPPA: topological analysis of pathway phenotype association , 2007, Bioinform..

[29]  Monica Chiogna,et al.  Gene set analysis exploiting the topology of a pathway , 2010, BMC Systems Biology.