The use of MPSS for whole-genome transcriptional analysis in Arabidopsis.

We have generated 36,991,173 17-base sequence "signatures" representing transcripts from the model plant Arabidopsis. These data were derived by massively parallel signature sequencing (MPSS) from 14 libraries and comprised 268,132 distinct sequences. Comparable data were also obtained with 20-base signatures. We developed a method for handling these data and for comparing these signatures to the annotated Arabidopsis genome. As part of this procedure, 858,019 potential or "genomic" signatures were extracted from the Arabidopsis genome and classified based on the position and orientation of the signatures relative to annotated genes. A comparison of genomic and expressed signatures matched 67,735 signatures predicted to be derived from distinct transcripts and expressed at significant levels. Expressed signatures were derived from the sense strand of at least 19,088 of 29,084 annotated genes. A comparison of the genomic and expression signatures demonstrated that approximately 7.7% of genomic signatures were underrepresented in the expression data. These genomic signatures contained one of 20 four-base words that were consistently associated with reduced MPSS abundances. More than 89% of the sum of the expressed signature abundances matched the Arabidopsis genome, and many of the unmatched signatures found in high abundances were predicted to match to previously uncharacterized transcripts.

[1]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[2]  S. Lewis,et al.  Genome annotation assessment in Drosophila melanogaster. , 2000, Genome research.

[3]  I. Sussex,et al.  Laser Capture Microdissection of Cells from Plant Tissues1 , 2003, Plant Physiology.

[4]  S. Ishii,et al.  Expression profiling using a tumor-specific cDNA microarray predicts the prognosis of intermediate risk neuroblastomas. , 2005, Cancer cell.

[5]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[6]  J. Rowley,et al.  Identifying novel transcripts and novel genes in the human genome by using novel SAGE tags , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  G. Bouffard,et al.  Gene discovery using computational and microarray analysis of transcription in the Drosophila melanogaster testis. , 2000, Genome research.

[8]  J. Elalouf,et al.  Serial Analysis of Gene Expression , 2006 .

[9]  Joseph M. Dale,et al.  Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome , 2003, Science.

[10]  Kara Dolinski,et al.  Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data , 2001, Nucleic Acids Res..

[11]  David W. Galbraith,et al.  Global Analysis of Cell Type-Specific Gene Expression , 2003, Comparative and functional genomics.

[12]  Mark Gerstein,et al.  Identification and correction of spurious spatial correlations in microarray data. , 2003, BioTechniques.

[13]  R. Guigó,et al.  An assessment of gene prediction accuracy in large DNA sequences. , 2000, Genome research.

[14]  Klaas Vandepoele,et al.  The hidden duplication past of Arabidopsis thaliana , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[15]  M. Frohman,et al.  Rapid production of full-length cDNAs from rare transcripts: amplification using a single gene-specific oligonucleotide primer. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[17]  F. Baas,et al.  The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains , 2001, Science.

[18]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[19]  J. Rinn,et al.  The transcriptional activity of human Chromosome 22. , 2003, Genes & development.

[20]  M. Morgante,et al.  Abundance, distribution, and transcriptional activity of repetitive elements in the maize genome. , 2001, Genome research.

[21]  F F Costa,et al.  Identification of human chromosome 22 transcribed sequences with ORF expressed sequence tags. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[22]  M. Delseny,et al.  Extensive Duplication and Reshuffling in the Arabidopsis Genome , 2000, Plant Cell.

[23]  S. Altschul,et al.  SAGEmap: a public gene expression resource. , 2000, Genome research.

[24]  E. Vermaas,et al.  In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[25]  B. Haas,et al.  Full-length messenger RNA sequences greatly improve genome annotation , 2002, Genome Biology.

[26]  M Schena,et al.  Microarrays: biotechnology's discovery platform for functional genomics. , 1998, Trends in biotechnology.

[27]  B. Haas,et al.  Annotation of the Arabidopsis Genome 1 , 2003 .

[28]  H Aburatani,et al.  Direct comparison of GeneChip and SAGE on the quantitative accuracy in transcript profiling analysis. , 2000, Genomics.

[29]  Huanming Yang,et al.  A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica) , 2002, Science.

[30]  K. Kinzler,et al.  Analysing uncharted transcriptomes with SAGE. , 2000, Trends in genetics : TIG.

[31]  R. Stoughton,et al.  Experimental annotation of the human genome using microarray technology , 2001, Nature.

[32]  A. Sparks,et al.  Using the transcriptome to annotate the genome , 2002, Nature Biotechnology.

[33]  T. Masumura,et al.  Construction of a specialized cDNA library from plant cells isolated by laser capture microdissection: toward comprehensive analysis of the genes expressed in the rice phloem. , 2002, The Plant journal : for cell and molecular biology.

[34]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[35]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[36]  G. Landes,et al.  Analysis of human transcriptomes , 1999, Nature Genetics.

[37]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[38]  T. Gojobori,et al.  The genome sequence and structure of rice chromosome 1 , 2002, Nature.

[39]  Yujun Zhang,et al.  Sequence and analysis of rice chromosome 4 , 2002, Nature.

[40]  Wei Zhou,et al.  Characterization of the Yeast Transcriptome , 1997, Cell.

[41]  Marta Matvienko,et al.  Arabidopsis MPSS. An Online Resource for Quantitative Expression Analysis1[w] , 2004, Plant Physiology.

[42]  Christopher D Town,et al.  Annotation of the Arabidopsis Genome1 , 2003, Plant Physiology.

[43]  A. Oliphant,et al.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). , 2002, Science.

[44]  D. G. Brown,et al.  The origins of genomic duplications in Arabidopsis. , 2000, Science.