Finding approximate tandem repeats in genomic sequences.

An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and its effectiveness on genomic data is demonstrated.

[1]  Eric Rivals,et al.  STAR: an algorithm to Search for Tandem Approximate Repeats , 2004, Bioinform..

[2]  Nikolay V. Dokholyan,et al.  Model of unequal chromosomal crossing over in DNA sequences 1 1 This work is supported by NIH-HGP. , 1998 .

[3]  A. Jeffreys,et al.  Minisatellite instability and germline mutation , 1999, Cellular and Molecular Life Sciences CMLS.

[4]  Arun Krishnan,et al.  Exhaustive whole-genome tandem repeats search , 2004, Bioinform..

[5]  Jens Stoye,et al.  Simple and flexible detection of contiguous repeats using a suffix tree , 2002, Theor. Comput. Sci..

[6]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[7]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[8]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[9]  J. Sutcliffe,et al.  Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome , 1991, Cell.

[10]  M. Waterman,et al.  The Erdos-Renyi Law in Distribution, for Coin Tossing and Sequence Matching , 1990 .

[11]  L. Cavalli-Sforza,et al.  High resolution of human evolutionary trees with polymorphic microsatellites , 1994, Nature.

[12]  F. Denoeud,et al.  A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis , 2001, BMC Microbiology.

[13]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[14]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[15]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 2001, J. Comput. Biol..

[16]  Sigeo Aki,et al.  Waiting time problems for a sequence of discrete random variables , 1992 .

[17]  Darryl Shibata,et al.  Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis , 1993, Nature.

[18]  R. Denell,et al.  The Drosophila ribosomal protein S6 gene includes a 3' triplication that arose by unequal crossing-over. , 1993, Molecular biology and evolution.

[19]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[20]  J. Beckmann,et al.  Toward a Unified Approach to Genetic Mapping of Eukaryotes Based on Sequence Tagged Microsatellite Sites , 1990, Bio/Technology.

[21]  Stefan M Pulst,et al.  Clinical features and ATTCT repeat expansion in spinocerebellar ataxia type 10. , 2002, Archives of neurology.

[22]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[23]  Y. Kashi,et al.  Simple sequence repeats as a source of quantitative genetic variation. , 1997, Trends in genetics : TIG.

[24]  Gary Benson,et al.  On the Distribution of K-tuple Matches for Sequence Homology: A Constant Time Exact Calculation of the Variance , 1998, J. Comput. Biol..

[25]  Filippo Aluffi-Pentini,et al.  STRING: finding tandem repeats in DNA sequences , 2003, Bioinform..

[26]  Len A. Pennacchio,et al.  Unstable minisatellite expansion causing recessively inherited myoclonus epilepsy, EPM1 , 1997, Nature Genetics.

[27]  Deborah Joseph,et al.  Beyond tandem repeats: complex pattern structures and distant regions of similarity , 2002, ISMB.

[28]  Yechezkel Kashi,et al.  Evolutionary tuning knobs , 1997 .

[29]  Max Dauchet,et al.  A first step toward chromosome analysis by compression algorithms , 1995, Proceedings First International Symposium on Intelligence in Neural and Biological Systems. INBS'95.

[30]  E. Nevo,et al.  Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review , 2002, Molecular ecology.

[31]  Sampath Kannan,et al.  An Algorithm for Locating Nonoverlapping Regions of Maximum Alignment Score , 1996, SIAM J. Comput..

[32]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[33]  Eugene W. Myers,et al.  Identifying Satellites and Periodic Repetitions in Biological Sequences , 1998, J. Comput. Biol..