GSVA: The Gene Set Variation Analysis package for microarray and RNA-seq data

The GSVA package implements a non-parametric unsupervised method, called Gene Set Variation Analysis (GSVA), for assessing gene set enrichment (GSE) in gene expression microarray and RNAseq data. In contrast to most GSE methods, GSVA performs a change in coordinate systems, transforming the data from a gene by sample matrix to a gene set by sample matrix. Thereby allowing for the evaluation of pathway enrichment for each sample. This transformation is done without the use of a phenotype, thus facilitating very powerful and open-ended analyses in a now pathway centric manner. In this vignette we illustrate how to use the GSVA package to perform some of these analyses using published microarray and RNA-seq data already pre-processed and stored in the companion experimental data package GSVAdata.

[1]  Antonio Canale,et al.  Bayesian Kernel Mixtures for Counts , 2011, Journal of the American Statistical Association.

[2]  M. Eileen Dolan,et al.  A genome-wide approach to identify genetic variants that contribute to etoposide-induced cytotoxicity , 2007, Proceedings of the National Academy of Sciences.

[3]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[4]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[5]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[6]  H. Willard,et al.  X-inactivation profile reveals extensive variability in X-linked gene expression in females , 2005, Nature.

[7]  Y. Xing,et al.  A Transcriptome Database for Astrocytes, Neurons, and Oligodendrocytes: A New Resource for Understanding Brain Development and Function , 2008, The Journal of Neuroscience.

[8]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..

[9]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[10]  Yuan Qi,et al.  Integrated Genomic Analysis Identifies Clinically Relevant Subtypes of Glioblastoma Characterized by Abnormalities in PDGFRA , IDH 1 , EGFR , and NF 1 Citation Verhaak , 2010 .

[11]  Anne Lohrli Chapman and Hall , 1985 .

[12]  Egon S. Pearson,et al.  Comparison of tests for randomness of points on a line , 1963 .

[13]  K. Hansen,et al.  Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[14]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[15]  Chang-Peng Wu,et al.  Integrated genomic analysis identifies clinically relevant subtypes of renal clear cell carcinoma , 2018, BMC Cancer.

[16]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[17]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[18]  Jun Lu,et al.  Pathway level analysis of gene expression using singular value decomposition , 2005, BMC Bioinformatics.

[19]  Ben S. Wittner,et al.  Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1 , 2009, Nature.

[20]  R. Irizarry,et al.  A gene expression bar code for microarray data , 2007, Nature Methods.

[21]  Sayan Mukherjee,et al.  Analysis of sample set enrichment scores: assaying the enrichment of sets of genes for individual samples in genome-wide expression profiles , 2006, ISMB.

[22]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[23]  T. Graves,et al.  The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes , 2003, Nature.