Finding approximate tandem repeats in genomic sequences

An efficient algorithm is presented for detecting approximate tandem repeats in genomic sequences. The algorithm is based on a flexible statistical model which allows a wide range of definitions of approximate tandem repeats. The ideas and methods underlying the algorithm are described and examined and its effectiveness on genomic data is demonstrated.

[1]  D. Morrison Multiple sequence alignment for phylogenetic purposes , 2006 .

[2]  Yi Xiao,et al.  Quasiperiodic property in Alu repeats. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Arun Krishnan,et al.  Exhaustive whole-genome tandem repeats search , 2004, Bioinform..

[4]  Eric Rivals,et al.  STAR: an algorithm to Search for Tandem Approximate Repeats , 2004, Bioinform..

[5]  Filippo Aluffi-Pentini,et al.  STRING: finding tandem repeats in DNA sequences , 2003, Bioinform..

[6]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[7]  E. Nevo,et al.  Microsatellites: genomic distribution, putative functions and mutational mechanisms: a review , 2002, Molecular ecology.

[8]  Stefan M Pulst,et al.  Clinical features and ATTCT repeat expansion in spinocerebellar ataxia type 10. , 2002, Archives of neurology.

[9]  Deborah Joseph,et al.  Beyond tandem repeats: complex pattern structures and distant regions of similarity , 2002, ISMB.

[10]  Jens Stoye,et al.  Simple and flexible detection of contiguous repeats using a suffix tree , 2002, Theor. Comput. Sci..

[11]  Norah Rudin,et al.  An introduction to forensic DNA analysis , 2001 .

[12]  D W Nebert,et al.  Pharmacogenomics: out of the lab and into the community. , 2001, Trends in Biotechnology.

[13]  J. Stoye,et al.  REPuter: the manifold applications of repeat analysis on a genomic scale. , 2001, Nucleic acids research.

[14]  Gregory Kucherov,et al.  Finding Approximate Repetitions under Hamming Distance , 2001, ESA.

[15]  F. Denoeud,et al.  A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis , 2001, BMC Microbiology.

[16]  A. R. Wagner Molecular Biology and Evolution , 2001 .

[17]  Gregory Kucherov,et al.  Finding repeats with fixed gap , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[18]  Ourania Chryssaphinou,et al.  Applications of Compound Poisson Approximation , 2000 .

[19]  D. Kim,et al.  Association of the dopamine transporter gene with Parkinson's disease in Korean patients. , 2000, Journal of Korean medical science.

[20]  Gregory Kucherov,et al.  Finding maximal repetitions in a word in linear time , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[21]  A. Jeffreys,et al.  Minisatellite instability and germline mutation , 1999, Cellular and Molecular Life Sciences CMLS.

[22]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[23]  J. Rüschoff,et al.  Detection of Microsatellite Instability (MSI) and Loss of Heterozygosity (LOH) in Colorectal Tumors by Fluorescence-based Multiplex Microsatellite PCR , 1999 .

[24]  Nikolay V. Dokholyan,et al.  Model of unequal chromosomal crossing over in DNA sequences 1 1 This work is supported by NIH-HGP. , 1998 .

[25]  Eugene W. Myers,et al.  Identifying Satellites and Periodic Repetitions in Biological Sequences , 1998, J. Comput. Biol..

[26]  Gary Benson,et al.  On the Distribution of K-tuple Matches for Sequence Homology: A Constant Time Exact Calculation of the Variance , 1998, J. Comput. Biol..

[27]  Len A. Pennacchio,et al.  Unstable minisatellite expansion causing recessively inherited myoclonus epilepsy, EPM1 , 1997, Nature Genetics.

[28]  Y. Kashi,et al.  Simple sequence repeats as a source of quantitative genetic variation. , 1997, Trends in genetics : TIG.

[29]  Yechezkel Kashi,et al.  Evolutionary tuning knobs , 1997 .

[30]  E. Uberbacher,et al.  A fast look-up algorithm for detecting repetitive DNA sequences , 1996 .

[31]  Sampath Kannan,et al.  An Algorithm for Locating Nonoverlapping Regions of Maximum Alignment Score , 1996, SIAM J. Comput..

[32]  Max Dauchet,et al.  A first step toward chromosome analysis by compression algorithms , 1995, Proceedings First International Symposium on Intelligence in Neural and Biological Systems. INBS'95.

[33]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[34]  L. Cavalli-Sforza,et al.  High resolution of human evolutionary trees with polymorphic microsatellites , 1994, Nature.

[35]  R. Denell,et al.  The Drosophila ribosomal protein S6 gene includes a 3' triplication that arose by unequal crossing-over. , 1993, Molecular biology and evolution.

[36]  Darryl Shibata,et al.  Ubiquitous somatic mutations in simple repeated sequences reveal a new mechanism for colonic carcinogenesis , 1993, Nature.

[37]  Sampath Kannan,et al.  An Algorithm for Locating Non-Overlapping Regions of Maximum Alignment Score , 1993, CPM.

[38]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 1993, CPM.

[39]  Manish S. Shah,et al.  A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes , 1993, Cell.

[40]  A J Jeffreys,et al.  Minisatellite variant repeat mapping: application to DNA typing and mutation analysis. , 1993, EXS.

[41]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Sigeo Aki,et al.  Waiting time problems for a sequence of discrete random variables , 1992 .

[43]  J. Sutcliffe,et al.  Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome , 1991, Cell.

[44]  J. Beckmann,et al.  Toward a Unified Approach to Genetic Mapping of Eukaryotes Based on Sequence Tagged Microsatellite Sites , 1990, Bio/Technology.

[45]  M. Waterman,et al.  The Erdos-Renyi Law in Distribution, for Coin Tossing and Sequence Matching , 1990 .

[46]  Michael G. Main,et al.  An O(n log n) Algorithm for Finding All Repetitions in a String , 1984, J. Algorithms.

[47]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[48]  Maxime Crochemore,et al.  An Optimal Algorithm for Computing the Repetitions in a Word , 1981, Inf. Process. Lett..

[49]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[50]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[51]  B. Harshbarger An Introduction to Probability Theory and its Applications, Volume I , 1958 .

[52]  W. Feller An Introduction to Probability Theory and Its Applications , 1959 .