RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Ultra-high-throughput sequencing is emerging as an attractive alternative to microarrays for genotyping, analysis of methylation patterns, and identification of transcription factor binding sites. Here, we describe an application of the Illumina sequencing (formerly Solexa sequencing) platform to study mRNA expression levels. Our goals were to estimate technical variance associated with Illumina sequencing in this context and to compare its ability to identify differentially expressed genes with existing array technologies. To do so, we estimated gene expression differences between liver and kidney RNA samples using multiple sequencing replicates, and compared the sequencing data to results obtained from Affymetrix arrays using the same RNA samples. We find that the Illumina sequencing data are highly replicable, with relatively little technical variation, and thus, for many purposes, it may suffice to sequence each mRNA sample only once (i.e., using one lane). The information in a single lane of Illumina sequencing data appears comparable to that in a single array in enabling identification of differentially expressed genes, while allowing for additional analyses such as detection of low-expressed genes, alternative splice variants, and novel transcripts. Based on our observations, we propose an empirical protocol and a statistical framework for the analysis of gene expression using ultra-high-throughput sequencing technology.

[1]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[2]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[3]  K. White Functional genomics and the study of development, variation and evolution , 2001, Nature Reviews Genetics.

[4]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[5]  Scott A. Rifkin,et al.  Evolution of gene expression in the Drosophila melanogaster subgroup , 2003, Nature Genetics.

[6]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[7]  Christina Gloeckner,et al.  Modern Applied Statistics With S , 2003 .

[8]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[10]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[11]  Clive Brown,et al.  Toward the 1,000 dollars human genome. , 2005, Pharmacogenomics.

[12]  Clive Brown,et al.  Toward the $1000 human genome , 2005 .

[13]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[14]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[15]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[16]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[17]  G. Gibson,et al.  Quantitative trait transcripts for nicotine resistance in Drosophila melanogaster , 2007, Nature Genetics.

[18]  T. Mikkelsen,et al.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells , 2007, Nature.

[19]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[20]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[21]  W. Kamps,et al.  Evidence Based Selection of Housekeeping Genes , 2007, PloS one.

[22]  J. Ohlrogge,et al.  Sampling the Arabidopsis Transcriptome with Massively Parallel Pyrosequencing1[W][OA] , 2007, Plant Physiology.

[23]  Mark D. Robinson,et al.  Moderated statistical tests for assessing differences in tag abundance , 2007, Bioinform..

[24]  S. Nelson,et al.  Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning , 2008, Nature.

[25]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[26]  Neil A. Miller,et al.  Transcriptome sequencing of malignant pleural mesothelioma tumors , 2008, Proceedings of the National Academy of Sciences.

[27]  Christian Schlötterer,et al.  Gene expression profiling by massively parallel sequencing. , 2007, Genome research.

[28]  Andreas Prlic,et al.  Ensembl 2008 , 2007, Nucleic Acids Res..