Human Trash ESTs - Sequences from cDNA Collection that are not Aligned to genome Assembly

Expressed sequence tags (ESTs) represent 500-1000-bp-long sequences corresponding to mRNAs derived from different sources (cell lines, tissues, etc.). The human EST database contains over 8,000,000 sequences, with over 4,000,000,000 total nucleotides. RNA molecules are transcribed from a genomic DNA template; therefore, all ESTs should match corresponding genomes. Nevertheless, we have found in the human EST database approximately 11,000 ESTs not matching sequences in the human genome database. The presence of "trash" ESTs (TESTs) in the EST database could result from DNA or RNA contamination of the laboratory equipment, tissues, or cell lines. TESTs could also represent sequences from unidentified human genes or from species inhabiting the human body. Here, we attempt to identify the sources of human EST database contaminations. In particular, we discuss systematic contamination of the mammalian EST databases with sequences of plants.

[1]  K. Marcu,et al.  On the existence of polyadenylated histone mRNA in Xenopus laevis oocytes , 1976, Cell.

[2]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[3]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[4]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[5]  E. H. Feinberg,et al.  Transport of dsRNA into Cells by the Transmembrane Protein SID-1 , 2003, Science.

[6]  Marc Dreyfus,et al.  The Poly(A) Tail of mRNAs Bodyguard in Eukaryotes, Scavenger in Bacteria , 2002, Cell.

[7]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[8]  A. Fire,et al.  Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans , 1998, Nature.

[9]  J. T. Dunnen,et al.  Copy number variation in the genome; the human DMD gene as an example , 2006, Cytogenetic and Genome Research.

[10]  M. Sugiura The chloroplast genome , 1992, Plant Molecular Biology.

[11]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[12]  A. Fire,et al.  RNA as a target of double-stranded RNA-mediated genetic interference in Caenorhabditis elegans. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[13]  S. Ashley,et al.  RNA interference: a mammalian SID-1 homologue enhances siRNA uptake and gene silencing efficacy in human cells. , 2005, Biochemical and biophysical research communications.

[14]  J. Darnell,et al.  Biogenesis and characterization of histone messenger RNA in HeLa cells. , 1972, Journal of molecular biology.