Comparison of CAGE and RNA-seq transcriptome profiling using clonally amplified and single-molecule next-generation sequencing

CAGE (cap analysis gene expression) and RNA-seq are two major technologies used to identify transcript abundances as well as structures. They measure expression by sequencing from either the 5' end of capped molecules (CAGE) or tags randomly distributed along the length of a transcript (RNA-seq). Library protocols for clonally amplified (Illumina, SOLiD, 454 Life Sciences [Roche], Ion Torrent), second-generation sequencing platforms typically employ PCR preamplification prior to clonal amplification, while third-generation, single-molecule sequencers can sequence unamplified libraries. Although these transcriptome profiling platforms have been demonstrated to be individually reproducible, no systematic comparison has been carried out between them. Here we compare CAGE, using both second- and third-generation sequencers, and RNA-seq, using a second-generation sequencer based on a panel of RNA mixtures from two human cell lines to examine power in the discrimination of biological states, detection of differentially expressed genes, linearity of measurements, and quantification reproducibility. We found that the quantified levels of gene expression are largely comparable across platforms and conclude that CAGE and RNA-seq are complementary technologies that can be used to improve incomplete gene models. We also found systematic bias in the second- and third-generation platforms, which is likely due to steps such as linker ligation, cleavage by restriction enzymes, and PCR amplification. This study provides a perspective on the performance of these platforms, which will be a baseline in the design of further experiments to tackle complex transcriptomes uncovered in a wide range of cell types.

[1]  T. Bickle,et al.  DNA recognition and cleavage by the EcoP15 restriction endonuclease. , 1979, Journal of molecular biology.

[2]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[3]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[4]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[5]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[6]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[7]  S. Ranade,et al.  Stem cell transcriptome profiling via massive-scale mRNA sequencing , 2008, Nature Methods.

[8]  Z. Ning,et al.  Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of GC-biased genomes , 2009, Nature Methods.

[9]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[10]  J. Kawai,et al.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[12]  N. Friedman,et al.  Comprehensive comparative analysis of strand-specific RNA sequencing methods , 2010, Nature Methods.

[13]  Martin S. Taylor,et al.  The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line , 2009, Nature Genetics.

[14]  Carsten O. Daub,et al.  TagDust—a program to eliminate artifacts from next generation sequencing data , 2009, Bioinform..

[15]  Lee T. Sam,et al.  A Comparison of Single Molecule and Amplification Based Sequencing of Cancer Transcriptomes , 2011, PloS one.

[16]  Piero Carninci,et al.  Development of a DNA barcode tagging method for monitoring dynamic changes in gene expression by using an ultra high-throughput sequencer. , 2008, BioTechniques.

[17]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[18]  Piero Carninci,et al.  Automated Workflow for Preparation of cDNA for Cap Analysis of Gene Expression on a Single Molecule Sequencer , 2012, PloS one.

[19]  Piero Carninci,et al.  Reduction of non-insert sequence reads by dimer eliminator LNA oligonucleotide for small RNA deep sequencing. , 2010, BioTechniques.

[20]  Piero Carninci,et al.  Unamplified Cap Analysis of Gene Expression on a Single-molecule Sequencer , 2022 .

[21]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[22]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[23]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning by biotinylated CAP trapper. , 1996, Genomics.

[24]  Timothy J. Durham,et al.  Combinatorial Patterning of Chromatin Regulators Uncovered by Genome-wide Location Analysis in Human Cells , 2011, Cell.

[25]  Jun Kawai,et al.  Genome-wide detection and analysis of hippocampus core promoters using DeepCAGE. , 2009, Genome research.

[26]  C. Kai,et al.  CAGE: cap analysis of gene expression , 2006, Nature Methods.

[27]  S. Quake,et al.  Single-Molecule DNA Sequencing of a Viral Genome , 2008, Science.

[28]  Shahar Alon,et al.  Barcoding bias in high-throughput multiplex sequencing of miRNA. , 2011, Genome research.

[29]  Clifford A. Meyer,et al.  Gene expression profiling of human breast tissue samples using SAGE-Seq. , 2010, Genome research.

[30]  D. Chen,et al.  Reverse transcriptase adds nontemplated nucleotides to cDNAs during 5'-RACE and primer extension. , 2001, BioTechniques.

[31]  Piero Carninci,et al.  5′ end–centered expression profiling using cap-analysis gene expression and next-generation sequencing , 2012, Nature Protocols.

[32]  Margaret C. Linak,et al.  Sequence-specific error profile of Illumina sequencers , 2011, Nucleic acids research.

[33]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[34]  Y. Hayashizaki,et al.  Deep cap analysis of gene expression. , 2011, Methods in molecular biology.

[35]  Mark M. Davis,et al.  Cell type–specific gene expression differences in complex tissues , 2010, Nature Methods.

[36]  Carsten O. Daub,et al.  Linking promoters to functional transcripts in small samples with nanoCAGE and CAGEscan , 2010, Nature Methods.

[37]  Tatiana A. Tatusova,et al.  NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy , 2011, Nucleic Acids Res..

[38]  Juliane C. Dohm,et al.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing , 2008, Nucleic acids research.

[39]  R. Vossen,et al.  Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms , 2008, Nucleic acids research.

[40]  Martin S. Taylor,et al.  Genome-wide analysis of mammalian promoter architecture and evolution , 2006, Nature Genetics.