FireμSat: meeting the challenge of detecting microsatellites in DNA

In the context of this paper, microsatellites refer to consecutive patterns contained by genomic sequences. There are several algorithms that detect microsatellites in DNA, using various computational techniques. In spite of this, there are still molecular biologists who find the output of these algorithms inadequate, and therefore visually scan hardcopy information of sequenced DNA in order to detect microsatellites. The purpose of this paper is threefold: to provide a literature overview of the existing software that detects microsatellites, either directly or indirectly; to compile criteria to which algorithms should comply in order to search effectively for microsatellites; and to motivate the corresponding parameters that have been designed to promote the usability of an analytical, effective algorithm to detect microsatellites. (Details of this algorithm are reported elsewhere.)

[1]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 1993, CPM.

[2]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[3]  Arun Krishnan,et al.  Exhaustive whole-genome tandem repeats search , 2004, Bioinform..

[4]  Jens Stoye,et al.  Simple and flexible detection of contiguous repeats using a suffix tree , 2002, Theor. Comput. Sci..

[5]  Aleksandar Milosavljevic,et al.  Discovering simple DNA sequences by the algorithmic significance method , 1993, Comput. Appl. Biosci..

[6]  E. Diamandis,et al.  Short tandem repeat polymorphism and cancer risk: influence of laboratory analysis on epidemiologic findings. , 2004, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[7]  B. Haas,et al.  A clustering method for repeat analysis in DNA sequences , 2001, Genome Biology.

[8]  Jan Paces Bioinformatics: tools for analysis of biological sequences , 2001, Stringology.

[9]  Derrick G. Kourie,et al.  FireµSat: An algorithm to detect microsatellites in DNA , 2006, Stringology.

[10]  Jean-Paul Delahaye,et al.  Detection of significant patterns by compression algorithms: the case of approximate tandem repeats in DNA sequences , 1997, Comput. Appl. Biosci..

[11]  Jeanette P. Schmidt,et al.  All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings , 1998, SIAM J. Comput..

[12]  Gregory Kucherov,et al.  Finding Approximate Repetitions under Hamming Distance , 2001, ESA.

[13]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 2001, J. Comput. Biol..

[14]  Eric Rivals,et al.  STAR: an algorithm to Search for Tandem Approximate Repeats , 2004, Bioinform..

[15]  Arnaud Lefebvre,et al.  FORRepeats: detects repeats on entire chromosomes and between genomes , 2003, Bioinform..

[16]  Dan Geiger,et al.  Finding approximate tandem repeats in genomic sequences. , 2005, Journal of computational biology : a journal of computational molecular cell biology.

[17]  Tsviya Olender,et al.  GeneCardsTM 2002: towards a complete, object-oriented, human gene compendium , 2002, Bioinform..

[18]  Gary Benson A Space Efficient Algorithm for Finding the Best Nonoverlapping Alignment Score , 1995, Theor. Comput. Sci..

[19]  Finn Drabløs,et al.  Detecting periodic patterns in biological sequences , 1998, Bioinform..

[20]  Max Dauchet,et al.  A first step toward chromosome analysis by compression algorithms , 1995, Proceedings First International Symposium on Intelligence in Neural and Biological Systems. INBS'95.

[21]  E. Uberbacher,et al.  A fast look-up algorithm for detecting repetitive DNA sequences , 1996 .

[22]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[23]  Eugene W. Myers,et al.  Identifying Satellites and Periodic Repetitions in Biological Sequences , 1998, J. Comput. Biol..

[24]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[25]  Eugene W. Myers,et al.  Identifying satellites in nucleic acid sequences , 1998, RECOMB '98.

[26]  S Karlin,et al.  Efficient algorithms for molecular sequence analysis. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Bryan Bergeron Bionformatics Computing , 2002 .

[28]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[29]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[30]  Guang R. Gao,et al.  TROLL-Tandem Repeat Occurrence Locator , 2002, Bioinform..

[31]  Sampath Kannan,et al.  An Algorithm for Locating Nonoverlapping Regions of Maximum Alignment Score , 1996, SIAM J. Comput..