Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines.

We describe the details of a serial analysis of gene expression (SAGE) library construction and analysis platform that has enabled the generation of >298 high-quality SAGE libraries and >30 million SAGE tags primarily from sub-microgram amounts of total RNA purified from samples acquired by microdissection. Several RNA isolation methods were used to handle the diversity of samples processed, and various measures were applied to minimize ditag PCR carryover contamination. Modifications in the SAGE protocol resulted in improved cloning and DNA sequencing efficiencies. Bioinformatic measures to automatically assess DNA sequencing results were implemented to analyze the integrity of ditag structure, linker or cross-species ditag contamination, and yield of high-quality tags per sequence read. Our analysis of singleton tag errors resulted in a method for correcting such errors to statistically determine tag accuracy. From the libraries generated, we produced an essentially complete mapping of reliable 21-base-pair tags to the mouse reference genome sequence for a meta-library of approximately 5 million tags. Our analyses led us to reject the commonly held notion that duplicate ditags are artifacts. Rather than the usual practice of discarding such tags, we conclude that they should be retained to avoid introducing bias into the results and thereby maintain the quantitative nature of the data, which is a major theoretical advantage of SAGE as a tool for global transcriptional profiling.

[1]  D L Riddle,et al.  Gene expression profiling of cells, tissues, and developmental stages of the nematode C. elegans. , 2003, Cold Spring Harbor symposia on quantitative biology.

[2]  Jacques Colinge,et al.  Bioinformatics Applications Note Detecting the Impact of Sequencing Errors on Sage Data , 2022 .

[3]  Peter Winter,et al.  Gene expression analysis of plant host–pathogen interactions by SuperSAGE , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Malte Buchholz,et al.  aRNA-longSAGE: a new approach to generate SAGE libraries from microdissected cells. , 2004, Nucleic acids research.

[5]  Thomas Ragg,et al.  The RIN: an RNA integrity number for assigning integrity values to RNA measurements , 2006, BMC Molecular Biology.

[6]  O. Vitolo,et al.  Improved NlaIII digestion of PAGE-purified 102 bp ditags by addition of a single purification step in both the SAGE and microSAGE protocols. , 2000, Nucleic acids research.

[7]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[8]  John J. Wyrick,et al.  Genome-wide location and function of DNA binding proteins. , 2000, Science.

[9]  High-throughput sequencing: a failure mode analysis , 2005, BMC Genomics.

[10]  J. Powell Enhanced concatemer cloning-a modification to the SAGE (Serial Analysis of Gene Expression) technique. , 1998, Nucleic acids research.

[11]  Jonghwan Kim,et al.  Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment , 2005, Nature Methods.

[12]  X. Chen,et al.  The Oct4 and Nanog transcription network regulates pluripotency in mouse embryonic stem cells , 2006, Nature Genetics.

[13]  M. Marra,et al.  Simple, robust methods for high-throughput nanoliter-scale DNA sequencing. , 2005, Genome research.

[14]  E. Liu,et al.  5' Long serial analysis of gene expression (LongSAGE) and 3' LongSAGE for transcriptome characterization and genome annotation. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Gail Mandel,et al.  Defining the CREB Regulon A Genome-Wide Analysis of Transcription Factor Regulatory Regions , 2004, Cell.

[16]  E. H. Margulies,et al.  Identification and prevention of a GC content bias in SAGE libraries. , 2001, Nucleic acids research.

[17]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[18]  C. Kai,et al.  CAGE: cap analysis of gene expression , 2006, Nature Methods.

[19]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[20]  Sarah Barber,et al.  A mouse atlas of gene expression: large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Guo-Liang Wang,et al.  Robust-LongSAGE (RL-SAGE): A Substantially Improved LongSAGE Method for Gene Discovery and Transcriptome Analysis1[w] , 2004, Plant Physiology.

[22]  A. Kassam,et al.  Comprehensive transcript analysis in small quantities of mRNA by SAGE-lite. , 1999, Nucleic acids research.

[23]  Clive Brown,et al.  Toward the $1000 human genome , 2005 .

[24]  Terence P Speed,et al.  Statistical modeling of sequencing errors in SAGE libraries. , 2004, Bioinformatics.

[25]  Donald L Riddle,et al.  Analysis of long-lived C. elegans daf-2 mutants using serial analysis of gene expression. , 2005, Genome research.

[26]  Ivan Sadowski,et al.  Identification of the mismatch repair genes PMS2 and MLH1 as p53 target genes by using serial analysis of binding elements. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[27]  J. Shendure,et al.  Advanced sequencing technologies: methods and goals , 2004, Nature Reviews Genetics.

[28]  James R. Knight,et al.  Genome sequencing in microfabricated high-density picolitre reactors , 2005, Nature.

[29]  J. H. Cramer,et al.  Two new tools: multi-purpose cloning vectors that carry kanamycin or spectinomycin/streptomycin resistance markers. , 1988, Gene.

[30]  E. Snyder,et al.  Reproducibility, bioinformatic analysis and power of the SAGE method to evaluate changes in transcriptome , 2005, Nucleic acids research.

[31]  K. Mühlemann,et al.  Substantially enhanced cloning efficiency of SAGE (Serial Analysis of Gene Expression) by adding a heating step to the original protocol. , 1999, Nucleic acids research.

[32]  A. Sparks,et al.  Using the transcriptome to annotate the genome , 2002, Nature Biotechnology.

[33]  Z. Weng,et al.  A Global Map of p53 Transcription-Factor Binding Sites in the Human Genome , 2006, Cell.

[34]  Viatcheslav R. Akmaev,et al.  Correction of sequence-based artifacts in serial analysis of gene expression , 2004, Bioinform..

[35]  J. L. Stanton,et al.  Molecular phenotype of the human oocyte by PCR-SAGE. , 2000, Genomics.