Next-generation tag sequencing for cancer gene expression profiling.

We describe a new method, Tag-seq, which employs ultra high-throughput sequencing of 21 base pair cDNA tags for sensitive and cost-effective gene expression profiling. We compared Tag-seq data to LongSAGE data and observed improved representation of several classes of rare transcripts, including transcription factors, antisense transcripts, and intronic sequences, the latter possibly representing novel exons or genes. We observed increases in the diversity, abundance, and dynamic range of such rare transcripts and took advantage of the greater dynamic range of expression to identify, in cancers and normal libraries, altered expression ratios of alternative transcript isoforms. The strand-specific information of Tag-seq reads further allowed us to detect altered expression ratios of sense and antisense (S-AS) transcripts between cancer and normal libraries. S-AS transcripts were enriched in known cancer genes, while transcript isoforms were enriched in miRNA targeting sites. We found that transcript abundance had a stronger GC-bias in LongSAGE than Tag-seq, such that AT-rich tags were less abundant than GC-rich tags in LongSAGE. Tag-seq also performed better in gene discovery, identifying >98% of genes detected by LongSAGE and profiling a distinct subset of the transcriptome characterized by AT-rich genes, which was expressed at levels below those detectable by LongSAGE. Overall, Tag-seq is sensitive to rare transcripts, has less sequence composition bias relative to LongSAGE, and allows differential expression analysis for a greater range of transcripts, including transcripts encoding important regulatory molecules.

[1]  David Bryant,et al.  DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists , 2007, Nucleic Acids Res..

[2]  Ryan D. Morin,et al.  Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. , 2008, Genome research.

[3]  Steven J. M. Jones,et al.  BMC Genomics BioMed Central Methodology article , 2006 .

[4]  S. Altschul,et al.  A public database for gene expression in human cancers. , 1999, Cancer research.

[5]  Jeppe Emmersen,et al.  DeepSAGE—digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples , 2006, Nucleic acids research.

[6]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[7]  D. Higgs,et al.  Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease , 2003, Nature Genetics.

[8]  Andreas von Bubnoff,et al.  Next-Generation Sequencing: The Race Is On , 2008, Cell.

[9]  Rotem Sorek,et al.  Naturally occurring antisense: transcriptional leakage or real overlap? , 2005, Genome research.

[10]  C. Vaquero,et al.  Do natural antisense transcripts make sense in eukaryotes? , 1998, Gene.

[11]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[12]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[13]  Steven J. M. Jones,et al.  LongSAGE profiling of nine human embryonic stem cell lines , 2007, Genome Biology.

[14]  Obi L. Griffith,et al.  Sequence biases in large scale gene expression profiling data , 2006, Nucleic acids research.

[15]  V. Seshan,et al.  Global hypomethylation of genomic DNA in cancer-associated myofibroblasts. , 2008, Cancer research.

[16]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[17]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.

[18]  Raja Jothi,et al.  Genome-wide identification of in vivo protein–DNA binding sites from ChIP-Seq data , 2008, Nucleic acids research.

[19]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[20]  C. Croce,et al.  Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Adrian W. Briggs,et al.  Analysis of one million base pairs of Neanderthal DNA , 2006, Nature.

[22]  Thomas Zeng,et al.  Global analysis of in vivo Foxa2-binding sites in mouse adult liver using massively parallel sequencing , 2008, Nucleic acids research.

[23]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[24]  Y. Pekarsky,et al.  Tcl1 expression in chronic lymphocytic leukemia is regulated by miR-29 and miR-181. , 2006, Cancer research.

[25]  Sarah Barber,et al.  A mouse atlas of gene expression: large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  P. Wincker,et al.  A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome , 2008, BMC Genomics.

[27]  L. Hurst,et al.  Genome-wide analysis of coordinate expression and evolution of human cis-encoded sense-antisense transcripts. , 2005, Trends in genetics : TIG.

[28]  Sergio Verjovski-Almeida,et al.  Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer , 2004, Oncogene.

[29]  A. Kassam,et al.  Comprehensive transcript analysis in small quantities of mRNA by SAGE-lite. , 1999, Nucleic acids research.

[30]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[31]  G. Tseng,et al.  MicroRNA expression profiling of thyroid tumors: biological significance and diagnostic utility. , 2008, The Journal of clinical endocrinology and metabolism.

[32]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[33]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[34]  A. Sparks,et al.  Using the transcriptome to annotate the genome , 2002, Nature Biotechnology.

[35]  Hans Lehrach,et al.  Characterizing the mouse ES cell transcriptome with Illumina sequencing. , 2008, Genomics.

[36]  Allen D. Delaney,et al.  Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines. , 2006, Genome research.

[37]  Kuo-Bin Li,et al.  Profiling MicroRNA Expression in Hepatocellular Carcinoma Reveals MicroRNA-224 Up-regulation and Apoptosis Inhibitor-5 as a MicroRNA-224-specific Target* , 2008, Journal of Biological Chemistry.

[38]  J. Ohlrogge,et al.  Sampling the Arabidopsis Transcriptome with Massively Parallel Pyrosequencing1[W][OA] , 2007, Plant Physiology.

[39]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[40]  L. Lim,et al.  MicroRNA targeting specificity in mammals: determinants beyond seed pairing. , 2007, Molecular cell.

[41]  S. Batalov,et al.  Antisense Transcription in the Mammalian Transcriptome , 2005, Science.

[42]  Claudia Petritsch,et al.  miR-124 and miR-137 inhibit proliferation of glioblastoma multiforme cells and induce differentiation of brain tumor stem cells , 2008 .

[43]  Simon C. Potter,et al.  An overview of Ensembl. , 2004, Genome research.

[44]  D. Bentley,et al.  Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[45]  V. Scaria,et al.  MicroRNA-mediated up-regulation of an alternatively polyadenylated variant of the mouse cytoplasmic β-actin gene , 2008, Nucleic acids research.

[46]  V. Kim MicroRNA biogenesis: coordinated cropping and dicing , 2005, Nature Reviews Molecular Cell Biology.

[47]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[48]  K. Kinzler,et al.  Serial Analysis of Gene Expression , 1995, Science.