Searching microsatellites in DNA sequences: approaches used and tools developed

Microsatellite instability associated genomic activities and evolutionary changes have led to a renewed focus on microsatellite research. In last decade, a number of microsatellite mining tools have been introduced based on different computational approaches. The choice is generally made between slow but exhaustive dynamic programming based approaches, or fast and incomplete heuristic methods. Tools based on stochastic approaches are more popular due to their simplicity and added ornamental features. We have performed a comparative evaluation of the relative efficiency of some microsatellite search tools with their default settings. The graphical user interface, the statistical analysis of the output and ability to mine imperfect repeats are the most important criteria in selecting a tool for a particular investigation. However, none of the available tools alone provides complete and accurate information about microsatellites, and a lot depends on the discretion of the user.

[1]  J. Perry,et al.  Rapid microsatellite development for water striders by next-generation sequencing. , 2011, The Journal of heredity.

[2]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[3]  Mireille Régnier,et al.  Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression , 2006, Bioinform..

[4]  R. Petit,et al.  Current trends in microsatellite genotyping , 2011, Molecular ecology resources.

[5]  Gary Benson,et al.  Tandem repeats over the edit distance , 2007, Bioinform..

[6]  S Karlin,et al.  Efficient algorithms for molecular sequence analysis. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[8]  M. V. Katti,et al.  Differential distribution of simple sequence repeats in eukaryotic genome sequences. , 2001, Molecular biology and evolution.

[9]  M. Blaxter,et al.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing , 2011, Nature Reviews Genetics.

[10]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[11]  D. Landsman,et al.  Repetitive DNA elements, nucleosome binding and human gene expression. , 2009, Gene.

[12]  Günter Kahl,et al.  Mining microsatellites in eukaryotic genomes. , 2007, Trends in biotechnology.

[13]  J. Jurka,et al.  Microsatellites in different eukaryotic genomes: survey and analysis. , 2000, Genome research.

[14]  Arun Krishnan,et al.  Exhaustive whole-genome tandem repeats search , 2004, Bioinform..

[15]  Finn Drabløs,et al.  Detecting periodic patterns in biological sequences , 1998, Bioinform..

[16]  Vincent A. Fischetti,et al.  Identifying Periodic Occurrences of a Template with Applications to Protein Structure , 1993, Inf. Process. Lett..

[17]  William A. Sethares,et al.  Periodicity transforms , 1999, IEEE Trans. Signal Process..

[18]  N. Gemmell,et al.  High frequency of microsatellites in S. cerevisiae meiotic recombination hotspots , 2008, BMC Genomics.

[19]  Eric Rivals,et al.  STAR: an algorithm to Search for Tandem Approximate Repeats , 2004, Bioinform..

[20]  N. Gemmell,et al.  Association of poly-purine/poly-pyrimidine sequences with meiotic recombination hot spots , 2006, BMC Genomics.

[21]  H. Padh,et al.  Advances in molecular marker techniques and their applications in plant sciences , 2008, Plant Cell Reports.

[22]  A. Michel,et al.  Combining Next-Generation Sequencing Strategies for Rapid Molecular Resource Development from an Invasive Aphid Species, Aphis glycines , 2010, PloS one.

[23]  M. Waterman,et al.  A method for fast database search for all k-nucleotide repeats. , 1994, Nucleic acids research.

[24]  K. Eckert,et al.  Every microsatellite is different: Intrinsic DNA features dictate mutagenesis of common microsatellites present in the human genome , 2009, Molecular carcinogenesis.

[25]  Atul Grover,et al.  Is spatial occurrence of microsatellites in the genome a determinant of their function and dynamics contributing to genome evolution ? , 2011 .

[26]  Robert Kofler,et al.  SciRoKo: a new tool for whole genome microsatellite search and investigation , 2007, Bioinform..

[27]  Ju-Kyung Yu,et al.  Nonrandom distribution and frequencies of genomic and EST-derived microsatellite markers in rice, wheat, and barley , 2005, BMC Genomics.

[28]  L. Forrest,et al.  Microsatellite primers in the Peltigera dolichorhiza complex (lichenized ascomycete, Peltigerales). , 2010, American journal of botany.

[29]  B. Faircloth,et al.  msatcommander: detection of microsatellite repeat arrays and automated, locus‐specific primer design , 2008, Molecular ecology resources.

[30]  Gary Benson Tandem Cyclic Alignment , 2001, CPM.

[31]  B. Haas,et al.  A clustering method for repeat analysis in DNA sequences , 2001, Genome Biology.

[32]  Mehmet Bilgen,et al.  A software program combining sequence motif searches with keywords for finding repeats containing DNA sequences , 2004, Bioinform..

[33]  Eugene W. Myers,et al.  Identifying satellites in nucleic acid sequences , 1998, RECOMB '98.

[34]  R. Varshney,et al.  Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.) , 2003, Theoretical and Applied Genetics.

[35]  Christoph Held,et al.  Isolation of microsatellites from unknown genomes using known genomes as enrichment templates , 2008 .

[36]  Kenneth A. Marx,et al.  Poly: a quantitative analysis tool for simple sequence repeat (SSR) tracts in DNA , 2003, BMC Bioinformatics.

[37]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[38]  Kuldip Singh,et al.  A Novel Signal Processing Measure to Identify Exact and Inexact Tandem Repeat Patterns in DNA Sequences , 2007, EURASIP J. Bioinform. Syst. Biol..

[39]  Angelika Merkel,et al.  Detecting short tandem repeats from genome data: opening the software black box , 2008, Briefings Bioinform..

[40]  G.T. Zhou,et al.  A fourier product method for detecting approximate TANDEM repeats in DNA , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.

[41]  Christian Schlötterer,et al.  Two distinct modes of microsatellite mutation processes: evidence from the complete genomic sequences of nine species. , 2003, Genome research.

[42]  Hong Yan,et al.  Detection of Tandem Repeats in DNA Sequences Based on Parametric Spectral Estimation , 2009, IEEE Transactions on Information Technology in Biomedicine.

[43]  Larry J Young,et al.  Microsatellite Instability Generates Diversity in Brain and Sociobehavioral Traits , 2005, Science.

[44]  Masaru Tomita,et al.  A novel feature of microsatellites in plants: a distribution gradient along the direction of transcription , 2003, FEBS letters.

[45]  Suparerk Janjarasjitt,et al.  Detection and visualization of tandem repeats in DNA sequences , 2003, IEEE Trans. Signal Process..

[46]  Denis C Shields,et al.  Tools for the identification of variable and potentially variable tandem repeats , 2006, BMC Genomics.

[47]  Andreas Graner,et al.  Genic microsatellite markers in plants: features and applications. , 2005, Trends in biotechnology.

[49]  M. Wingfield,et al.  Microsatellite discovery by deep sequencing of enriched genomic libraries. , 2009, BioTechniques.

[50]  Akhilesh K. Tyagi,et al.  De Novo Assembly of Chickpea Transcriptome Using Short Reads for Gene Discovery and Marker Identification , 2011, DNA research : an international journal for rapid publication of reports on genes and genomes.

[51]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[52]  Alan M. Durham,et al.  TRAP: automated classification, quantification and annotation of tandemly repeated sequences , 2006, Bioinform..

[53]  M. Todesco,et al.  A Genetic Defect Caused by a Triplet Repeat Expansion in Arabidopsis thaliana , 2009, Science.

[54]  Gajendra P. S. Raghava,et al.  Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation , 2004, Bioinform..

[55]  Ping Li,et al.  Consensus features of microsatellite distribution: microsatellite contents are universally correlated with recombination rates and are preferentially depressed by centromeres in multicellular eukaryotic genomes. , 2009, Genomics.

[56]  Guang R. Gao,et al.  TROLL-Tandem Repeat Occurrence Locator , 2002, Bioinform..

[57]  Filippo Aluffi-Pentini,et al.  STRING: finding tandem repeats in DNA sequences , 2003, Bioinform..

[58]  Dan Geiger,et al.  Finding approximate tandem repeats in genomic sequences , 2004, RECOMB.

[59]  Thomas W. Parks,et al.  Orthogonal, exactly periodic subspace decomposition , 2003, IEEE Trans. Signal Process..

[60]  Hampapathalu A. Nagarajaram,et al.  Genome analysis IMEx : Imperfect Microsatellite Extractor , 2007 .

[61]  M. Bilgen,et al.  Exact tandem repeats analyzer (E-TRA): A new program for DNA sequence mining , 2005, Journal of Genetics.

[62]  Hong Yan,et al.  OMWSA: detection of DNA repeats using moving window spectral analysis , 2007, Bioinform..

[63]  Kuldip Singh,et al.  Exactly periodic subspace decomposition based approach for identifying tandem repeats in DNA sequences , 2006, 2006 14th European Signal Processing Conference.

[64]  L. Zane,et al.  Strategies for microsatellite isolation: a review , 2002, Molecular ecology.

[65]  Andrzej K. Brodzik Quaternionic periodicity transform: an algebraic solution to the tandem repeat detection problem , 2007, Bioinform..

[66]  Alexander S. Mikheyev,et al.  Rapid Microsatellite Isolation from a Butterfly by De Novo Transcriptome Sequencing: Performance and a Comparison with AFLP-Derived Distances , 2010, PloS one.

[67]  Akito Taneda Adplot: detection and visualization of repetitive patterns in complete genomes , 2004, Bioinform..

[68]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[69]  Lars Kraemer,et al.  STAMP: Extensions to the STADEN sequence analysis package for high throughput interactive microsatellite marker design , 2009, BMC Bioinformatics.

[70]  Chi-Ren Shyu,et al.  ACMES: fast multiple-genome searches for short repeat sequences with concurrent cross-species information retrieval , 2004, Nucleic Acids Res..