GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences

GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts.

[1]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[2]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[3]  Kenneth K. Lopiano,et al.  RNA-seq: technical variability and sampling , 2011, BMC Genomics.

[4]  Mingyao Li,et al.  RNA-sequence analysis of human B-cells. , 2011, Genome research.

[5]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[6]  Jeff H. Chang,et al.  The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq , 2011 .

[7]  J. Leadbetter,et al.  RNA-seq reveals cooperative metabolic interactions between two termite-gut spirochete species in co-culture , 2011, The ISME Journal.

[8]  Alvis Brazma,et al.  A pipeline for RNA-seq data processing and quality assessment , 2011, Bioinform..

[9]  Michael M. Mwangi,et al.  Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA editing targets in transcript 3′ UTRs , 2010, Nature Structural &Molecular Biology.

[10]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[11]  L. Coin,et al.  Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads , 2011, Genome Biology.

[12]  Derek Y. Chiang,et al.  MapSplice: Accurate mapping of RNA-seq reads for splice junction discovery , 2010, Nucleic acids research.

[13]  S. Salzberg Recent advances in RNA sequence analysis , 2010, F1000 biology reports.

[14]  B. Langmead,et al.  Cloud-scale RNA-sequencing differential expression analysis with Myrna , 2010, Genome Biology.

[15]  John P. Rathjen,et al.  Plant immunity: towards an integrated view of plant–pathogen interactions , 2010, Nature Reviews Genetics.

[16]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[17]  Weng-Keen Wong,et al.  Gene expression Advance Access publication April 21, 2010 Supersplat—spliced RNA-seq alignment , 2009 .

[18]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[19]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[20]  Henry D. Priest,et al.  Genome-wide mapping of alternative splicing in Arabidopsis thaliana. , 2010, Genome research.

[21]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[22]  Jeff H. Chang,et al.  Recombineering and stable integration of the Pseudomonas syringae pv. syringae 61 hrp/hrc cluster into the genome of the soil bacterium Pseudomonas fluorescens Pf0-1. , 2009, The Plant journal : for cell and molecular biology.

[23]  T. Mockler,et al.  Analysis of transcriptome changes induced by Ptr ToxA in wheat provides insights into the mechanisms of plant susceptibility. , 2009, Molecular plant.

[24]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[25]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[26]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[27]  Tyler W. H. Backman,et al.  Computational and analytical framework for small RNA profiling by high-throughput sequencing. , 2009, RNA.

[28]  Samuel E. Fox,et al.  Applications of ultra-high-throughput sequencing. , 2009, Methods in molecular biology.

[29]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[30]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[31]  Fumiaki Katagiri,et al.  The genetic network controlling the Arabidopsis transcriptional response to Pseudomonas syringae pv. maculicola: roles of major regulators and the phytotoxin coronatine. , 2008, Molecular plant-microbe interactions : MPMI.

[32]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[33]  Martin Vingron,et al.  Ontologizer 2.0 - a multifunctional tool for GO term enrichment analysis and data exploration , 2008, Bioinform..

[34]  F. Ausubel,et al.  Activation of defense response pathways by OGs and Flg22 elicitors in Arabidopsis seedlings. , 2008, Molecular plant.

[35]  J. Glazebrook,et al.  Interplay between MAMP-triggered and SA-mediated defense responses. , 2008, The Plant journal : for cell and molecular biology.

[36]  M. Robinson,et al.  Small-sample estimation of negative binomial dispersion, with applications to SAGE data. , 2007, Biostatistics.

[37]  Martin Vingron,et al.  Improved detection of overrepresentation of Gene-Ontology annotations with parent-child analysis , 2007, Bioinform..

[38]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[39]  Murray Grant,et al.  Type III effectors orchestrate a complex interplay between transcriptional networks to modify basal defence responses during pathogenesis and resistance. , 2006, The Plant journal : for cell and molecular biology.

[40]  Sheng Yang He,et al.  Genome-wide transcriptional analysis of the Arabidopsis thaliana interaction with the plant pathogen Pseudomonas syringae pv. tomato DC3000 and the human pathogen Escherichia coli O157:H7. , 2006, The Plant journal : for cell and molecular biology.

[41]  Gordon K. Smyth,et al.  affylmGUI: a graphical user interface for linear modeling of single channel microarray data , 2006, Bioinform..

[42]  Gregory R. Grant,et al.  A practical false discovery rate approach to identifying patterns of differential expression in microarray data , 2005, Bioinform..

[43]  Jonathan D. G. Jones,et al.  The Transcriptional Innate Immune Response to flg22. Interplay and Overlap with Avr Gene-Dependent Defense Responses and Bacterial Pathogenesis1[w] , 2004, Plant Physiology.

[44]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[45]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[46]  Fumiaki Katagiri,et al.  Topology of the network integrating salicylate and jasmonate signal transduction derived from global expression phenotyping. , 2003, The Plant journal : for cell and molecular biology.

[47]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[48]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[49]  Ramesh Raina,et al.  Characterizing the stress/defense transcriptome of Arabidopsis , 2003, Genome Biology.

[50]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[51]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[52]  A. Collmer,et al.  Characterization of the hrpC and hrpRSOperons of Pseudomonas syringae Pathovars Syringae, Tomato, and Glycinea and Analysis of the Ability of hrpF,hrpG, hrcC, hrpT, and hrpVMutants To Elicit the Hypersensitive Response and Disease in Plants , 1998, Journal of bacteriology.

[53]  S. He,et al.  Hrp pilus: an hrp-dependent bacterial surface appendage produced by Pseudomonas syringae pv. tomato DC3000. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[54]  K. McGraw,et al.  Forming inferences about some intraclass correlation coefficients. , 1996 .