Mining and survey of simple sequence repeats in expressed sequence tags of dicotyledonous species.

Simple sequence repeat (SSR) markers are widely used in many plant and animal genomes due to their abundance, hypervariability, and suitability for high-throughput analysis. Development of SSR markers using molecular methods is time consuming, laborious, and expensive. Use of computational approaches to mine ever-increasing sequences such as expressed sequence tags (ESTs) in public databases permits rapid and economical discovery of SSRs. Most of such efforts to date focused on mining SSRs from monocotyledonous ESTs. In this study, we have computationally mined and examined the abundance of SSRs in more than 1.54 million ESTs belonging to 55 dicotyledonous species. The frequency of ESTs containing SSRs among species ranged from 2.65% to 16.82%. Dinucleotide repeats were found to be the most abundant followed by tri- or mono-nucleotide repeats. The motifs A/T, AG/GA/CT/TC, and AAG/AGA/GAA/CTT/TTC/TCT were the predominant mono-, di-, and tri-nucleotide SSRs, respectively. Most of the mononucleotide SSRs contained 15-25 repeats, whereas the majority of the di- and tri-nucleotide SSRs contained 5-10 repeats. The comprehensive SSR survey data presented here demonstrates the potential of in silico mining of ESTs for rapid development of SSR markers for genetic analysis and applications in dicotyledonous crops.

[1]  S. Tingey,et al.  Genetic diagnostics in plant breeding: RAPDs, microsatellites and machines. , 1993, Trends in genetics : TIG.

[2]  J. Jenkins,et al.  MOLECULAR BIOLOGY AND PHYSIOLOGY EST-SSR: A New Class of Genetic Markers in Cotton , 2004 .

[3]  J. Dodgson,et al.  DNA marker technology: a revolution in animal genetics. , 1997, Poultry science.

[4]  J. Jurka,et al.  Simple repetitive DNA sequences from primates: Compilation and analysis , 1995, Journal of Molecular Evolution.

[5]  Rolf Hilfiker,et al.  The use of single-nucleotide polymorphism maps in pharmacogenomics , 2000, Nature Biotechnology.

[6]  John M. Hancock,et al.  Simple sequences and the expanding genome. , 1996, BioEssays : news and reviews in molecular, cellular and developmental biology.

[7]  R. J. Henry,et al.  Analysis of SSRs derived from grape ESTs , 2000, Theoretical and Applied Genetics.

[8]  J. Todd,et al.  Microsatellites for linkage analysis of genetic traits. , 1992, Trends in genetics : TIG.

[9]  C. Cullis,et al.  The use of DNA polymorphisms in genetic mapping. , 2002, Genetic engineering.

[10]  J. Jurka,et al.  Microsatellites in different eukaryotic genomes: survey and analysis. , 2000, Genome research.

[11]  J. L. Weber,et al.  Survey of plant short tandem DNA repeats , 1994, Theoretical and Applied Genetics.

[12]  Andreas Graner,et al.  Genic microsatellite markers in plants: features and applications. , 2005, Trends in biotechnology.

[13]  R. Varshney,et al.  Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.) , 2003, Theoretical and Applied Genetics.

[14]  V. Poncet,et al.  SSR mining in coffee tree EST databases: potential use of EST–SSRs as markers for the Coffea genus , 2006, Molecular Genetics and Genomics.

[15]  C. Gessler,et al.  Transferability of olive microsatellite loci across the genus Olea , 2003, Theoretical and Applied Genetics.

[16]  M. Boyce-Jacino,et al.  A SNPshot: pharmacogenetics and the future of drug therapy. , 2000, Trends in biotechnology.

[17]  S. Kresovich,et al.  The potential for cross-taxa simple-sequence repeat (SSR) amplification between Arabidopsis thaliana L. and crop brassicas , 1998, Theoretical and Applied Genetics.

[18]  M. V. Katti,et al.  Differential distribution of simple sequence repeats in eukaryotic genome sequences. , 2001, Molecular biology and evolution.

[19]  C. Schlötterer,et al.  Drosophila virilis has long and highly polymorphic microsatellites. , 2000, Molecular biology and evolution.

[20]  M. Morgante,et al.  Intimate association of microsatellite repeats with retrotransposons and other dispersed repetitive elements in barley. , 1999, The Plant journal : for cell and molecular biology.

[21]  D. Tautz,et al.  Simple sequences are ubiquitous repetitive components of eukaryotic genomes. , 1984, Nucleic acids research.

[22]  W. Powell,et al.  Isolation of EST-derived microsatellite markers for genotyping the A and B genomes of wheat , 2002, Theoretical and Applied Genetics.

[23]  Andrew J. Robinson,et al.  Simple sequence repeat marker loci discovery using SSR primer. , 2004, Bioinformatics.

[24]  P. Kwok,et al.  Single nucleotide polymorphism hunting in cyberspace , 1998, Human mutation.

[25]  R. Terauchi,et al.  Microsatellite polymorphism in Dioscorea tokoro, a wild yam species. , 1994, Genome.

[26]  Passoupathy Rajendrakumar,et al.  Simple sequence repeats in organellar genomes of rice: frequency and distribution in genic and intergenic regions , 2007, Bioinform..

[27]  Sukumar Saha,et al.  Simple sequence repeats as useful resources to study transcribed genes of cotton , 2003, Euphytica.

[28]  M. Morgante,et al.  Microsatellites are preferentially associated with nonrepetitive DNA in plant genomes , 2002, Nature Genetics.

[29]  H. Margalit,et al.  Microsatellite spreading in the human genome: evolutionary mechanisms and structural implications. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[30]  A. A. Garcia,et al.  Survey in the sugarcane expressed sequence tag database (SUCEST) for simple sequence repeats. , 2004, Genome.

[31]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.

[32]  Thomas Thiel,et al.  In silico analysis on frequency and distribution of microsatellites in ESTs of some cereal species. , 2002, Cellular & molecular biology letters.

[33]  R. Sinden,et al.  Trinucleotide repeat DNA structures: dynamic mutations from dynamic DNA. , 1998, Current opinion in structural biology.

[34]  Liangjiang Wang,et al.  Tall fescue EST-SSR markers with transferability across several grass species , 2004, Theoretical and Applied Genetics.

[35]  P. Cregan,et al.  Length polymorphisms of simple sequence repeat DNA in soybean. , 1992, Genetics.

[36]  S. Kumpatla,et al.  An improved enrichment procedure to develop multiple repeat classes of cotton microsatellite markers , 2004, Plant Molecular Biology Reporter.

[37]  J. Bennetzen Comparative Sequence Analysis of Plant Nuclear Genomes: Microcolinearity and Its Many Exceptions , 2000, Plant Cell.

[38]  D. Marshall,et al.  Computational and experimental characterization of physically clustered simple sequence repeats in plants. , 2000, Genetics.

[39]  G. May,et al.  Medicago truncatula EST-SSRs reveal cross-species genetic markers for Medicago spp. , 2004, Theoretical and Applied Genetics.

[40]  L. Andersson,et al.  The abundance of various polymorphic microsatellite motifs differs between plants and vertebrates. , 1993, Nucleic acids research.

[41]  W. Powell,et al.  Polymorphism revealed by simple sequence repeats , 1996 .

[42]  Sun Kim,et al.  Graph Theoretic Sequence Clustering Algorithms and Their Applications to Genome Comparison , 2003, Computational Biology and Genome Informatics.

[43]  L. Fraser,et al.  EST-derived microsatellites from Actinidia species and their potential for mapping , 2004, Theoretical and Applied Genetics.

[44]  D. Tautz Hypervariability of simple sequences as a general source for polymorphic DNA markers. , 1989, Nucleic acids research.

[45]  L. Lipovich,et al.  Diversity of microsatellites derived from genomic libraries and GenBank sequences in rice (Oryza sativa L.) , 2000, Theoretical and Applied Genetics.

[46]  D. Choi,et al.  Exploitation of pepper EST–SSRs and an SSR-based linkage map , 2006, Theoretical and Applied Genetics.

[47]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[48]  K. Edwards,et al.  Microsatellite libraries enriched for several microsatellite sequences in plants. , 1996, BioTechniques.

[49]  Ju-Kyung Yu,et al.  Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley , 2005, BMC Genomics.

[50]  M. Morgante,et al.  PCR-amplified microsatellites as markers in plant genetics. , 1993, The Plant journal : for cell and molecular biology.

[51]  M. Sorrells,et al.  Data mining for simple sequence repeats in expressed sequence tags from barley, maize, rice, sorghum and wheat , 2002, Plant Molecular Biology.

[52]  S. Kumpatla Computational Mining and Survey of Simple Sequence Repeats (SSRs) in Expressed Sequence Tags (ESTs) of Dicotyledonous Plants , 2004 .

[53]  L. Lipovich,et al.  Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. , 2001, Genome research.

[54]  M. Morgante,et al.  Polymorphic simple sequence repeat regions in chloroplast genomes: applications to the population genetics of pines. , 1995, Proceedings of the National Academy of Sciences of the United States of America.