Design and validation issues in RNA-seq experiments

The next-generation sequencing technologies are being rapidly applied in biological research. Tens of millions of short sequences generated in a single experiment provide us enormous information on genome composition, genetic variants, gene expression levels and protein binding sites depending on the applications. Various methods are being developed for analyzing the data generated by these technologies. However, the relevant experimental design issues have rarely been discussed. In this review, we use RNA-seq as an example to bring this topic into focus and to discuss experimental design and validation issues pertaining to next-generation sequencing in the quantification of transcripts.

[1]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[2]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[3]  P. Park Epigenetics meets next-generation sequencing , 2008, Epigenetics.

[4]  Mark Gerstein,et al.  Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets. , 2008, Genome research.

[5]  W. Ansorge Next-generation DNA sequencing techniques. , 2009, New biotechnology.

[6]  D. Clayton,et al.  Genome-wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing , 2009, Human molecular genetics.

[7]  R. Lister,et al.  Highly Integrated Single-Base Resolution Maps of the Epigenome in Arabidopsis , 2008, Cell.

[8]  Tetsuya Yomo,et al.  Universality and flexibility in gene expression from bacteria to human. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Laura Camarena,et al.  Molecular Mechanisms of Ethanol-Induced Pathogenesis Revealed by RNA-Sequencing , 2010, PLoS pathogens.

[10]  Gary M Hellmann,et al.  Confirming microarray data—is it really necessary? , 2003, Genomics.

[11]  Joseph K. Pickrell,et al.  Understanding mechanisms underlying human gene expression variation with RNA sequencing , 2010, Nature.

[12]  S. Schuster Next-generation sequencing transforms today's biology , 2008, Nature Methods.

[13]  B. Frey,et al.  Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing , 2008, Nature Genetics.

[14]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[15]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[16]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[17]  Katrin Hoffmann,et al.  Gene expression levels assessed by oligonucleotide microarray analysis and quantitative real-time RT-PCR – how well do they correlate? , 2005, BMC Genomics.

[18]  Thomas J. Hudson,et al.  Differential Allelic Expression in the Human Genome: A Robust Approach To Identify Genetic and Epigenetic Cis-Acting Mechanisms Regulating Gene Expression , 2008, PLoS genetics.

[19]  E. Wit Design and Analysis of DNA Microarray Investigations , 2004, Human Genomics.

[20]  Ali Bashir,et al.  Evaluation of Paired-End Sequencing Strategies for Detection of Genome Rearrangements in Cancer , 2008, PLoS Comput. Biol..

[21]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[22]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[23]  Xiangqin Cui,et al.  Experimental Designs on High‐Throughput Biological Experiments , 2010 .

[24]  Ali Bashir,et al.  Designing deep sequencing experiments: detecting structural variation and estimating transcript abundance , 2009, BMC Genomics.

[25]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[26]  C. Furusawa,et al.  Zipf's law in gene expression. , 2002, Physical review letters.

[27]  G. Churchill Fundamentals of experimental design for cDNA microarrays , 2002, Nature Genetics.

[28]  Marcel H. Schulz,et al.  A Global View of Gene Activity and Alternative Splicing by Deep Sequencing of the Human Transcriptome , 2008, Science.

[29]  Y. Xing,et al.  Detection of splice junctions from paired-end RNA-seq data by SpliceMap , 2010, Nucleic acids research.

[30]  Anthony P. Fejes,et al.  Genome-wide relationship between histone H3 lysine 4 mono- and tri-methylation and transcription factor binding. , 2008, Genome research.

[31]  G. Hon,et al.  Next-generation genomics: an integrative approach , 2010, Nature Reviews Genetics.

[32]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[33]  Lin Feng,et al.  Power of Deep Sequencing and Agilent Microarray for Gene Expression Profiling Study , 2010, Molecular biotechnology.

[34]  Xuehui Huang,et al.  Function annotation of the rice transcriptome at single-nucleotide resolution by RNA-seq. , 2010, Genome research.

[35]  R. Søkilde,et al.  Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing. , 2009, RNA.

[36]  T. Speed,et al.  Design issues for cDNA microarray experiments , 2002, Nature Reviews Genetics.

[37]  Uwe Ohler,et al.  A paired-end sequencing strategy to map the complex landscape of transcription initiation , 2010, Nature Methods.

[38]  L. Ponnala,et al.  Deep RNA sequencing of L. monocytogenes reveals overlapping and extensive stationary phase and sigma B-dependent transcriptomes, including multiple highly transcribed noncoding RNAs , 2009, BMC Genomics.

[39]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[40]  David B. Allison,et al.  The PowerAtlas: a power and sample size atlas for microarray experimental design and research , 2006, BMC Bioinformatics.

[41]  M. Robinson,et al.  Small-sample estimation of negative binomial dispersion, with applications to SAGE data. , 2007, Biostatistics.

[42]  Leighton J. Core,et al.  Nascent RNA Sequencing Reveals Widespread Pausing and Divergent Initiation at Human Promoters , 2008, Science.

[43]  S. Luo,et al.  Chimeric transcript discovery by paired-end transcriptome sequencing , 2009, Proceedings of the National Academy of Sciences.

[44]  H. Ng,et al.  Testing the equality of two Poisson means using the rate ratio , 2005, Statistics in medicine.

[45]  Lira Mamanova,et al.  FRT-seq: Amplification-free, strand-specific, transcriptome sequencing , 2010, Nature Methods.

[46]  W. R. Schucany,et al.  Testing the Ratio of Two Poisson Rates , 2008, Biometrical journal. Biometrische Zeitschrift.

[47]  Steven J. M. Jones,et al.  BMC Genomics BioMed Central Methodology article , 2006 .

[48]  Inmaculada B. Aban,et al.  Inferences and power analysis concerning two negative binomial distributions with an application to MRI lesion counts data , 2009, Comput. Stat. Data Anal..

[49]  Eric T. Wang,et al.  An Abundance of Ubiquitously Expressed Genes Revealed by Tissue Transcriptome Sequence Data , 2009, PLoS Comput. Biol..

[50]  A. Visel,et al.  ChIP-seq accurately predicts tissue-specific activity of enhancers , 2009, Nature.

[51]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[52]  Andrew K. Benson,et al.  Paired-End Sequence Mapping Detects Extensive Genomic Rearrangement and Translocation during Divergence of Francisella tularensis subsp. tularensis and Francisella tularensis subsp. holarctica Populations , 2006, Journal of bacteriology.

[53]  G. Tuteja,et al.  Extracting transcription factor targets from ChIP-Seq data , 2009, Nucleic acids research.

[54]  G A Whitmore,et al.  Power and sample size for DNA microarray studies , 2002, Statistics in medicine.

[55]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[56]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[57]  A. Oshlack,et al.  Transcript length bias in RNA-seq data confounds systems biology , 2009, Biology Direct.

[58]  Piotr J. Balwierz,et al.  Methods for analyzing deep sequencing expression data: constructing the human and mouse promoterome with deepCAGE data , 2009, Genome Biology.

[59]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[60]  K. Hansen,et al.  Biases in Illumina transcriptome sequencing caused by random hexamer priming , 2010, Nucleic acids research.

[61]  Rory A. Fisher,et al.  The Arrangement of Field Experiments , 1992 .

[62]  Wenhan Zhu,et al.  Bacillus anthracis genome organization in light of whole transcriptome sequencing , 2010, BMC Bioinformatics.

[63]  Liang Chen,et al.  A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level , 2009, Nucleic acids research.

[64]  R. Kuehl Design of Experiments: Statistical Principles of Research Design and Analysis , 1999 .

[65]  R Fisher,et al.  Design of Experiments , 1936 .

[66]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[67]  Timothy E. Reddy,et al.  Distinct DNA methylation patterns characterize differentiated human embryonic stem cells and developing human fetal liver. , 2009, Genome research.

[68]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[69]  Sumio Sugano,et al.  High-Resolution Analysis of the 5′-End Transcriptome Using a Next Generation DNA Sequencer , 2009, PloS one.

[70]  Henry C. Thode Power and sample size requirements for tests of differences between two Poisson rates , 1997 .

[71]  E. Cuppen,et al.  Limitations and possibilities of small RNA digital gene expression profiling , 2009, Nature Methods.

[72]  E. Mardis,et al.  Transcriptome-Wide Identification of Novel Imprinted Genes in Neonatal Mouse Brain , 2008, PloS one.

[73]  Susan M. Huse,et al.  Microbial Population Structures in the Deep Marine Biosphere , 2007, Science.