Search and classification of potential minisatellite sequences from bacterial genomes.

We used the method of Information Decomposition developed by us to identify the latent dinucleotide periodicity regions in bacterial genomes. The number of potential minisatellite sequences obtained at high level of statistical significance was 454. Then we classified the periodicity matrices and obtained 45 classes. We used the other new method developed by us--Modified Profile Analysis--to reveal more periodic sequences in the presence of indels using the classes obtained. The number of sequences found by combination of these two methods was 3949. Most of them cannot be revealed by other methods including dynamic programming and Fourier transformation.

[1]  E. Korotkov,et al.  MIR: Family of repeats common to vertebrate genomes , 2000, Molecular Biology.

[2]  T. Boby,et al.  TRbase: a database relating tandem repeats to disease genes for the human genome , 2005, Bioinform..

[3]  Gary Benson,et al.  Tandem cyclic alignment , 2005, Discret. Appl. Math..

[4]  E. Korotkov,et al.  Latent Periodicity of Serine/Threonine and Tyrosine Protein Kinases and Other Protein Families , 2004, Molecular Biology.

[5]  F. Frenkel,et al.  Evolution of tRNA-like sequences and genome variability. , 2004, Gene.

[6]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[7]  Nikolai A. Kudryashov,et al.  Information decomposition method to analyze symbolical sequences , 2003 .

[8]  L. Singh,et al.  Genome-wide analysis of microsatellite repeats in humans: their abundance and density in specific genomic regions , 2003, Genome Biology.

[9]  Gajendra P. S. Raghava,et al.  Locating probable genes using Fourier transform approach , 2002, Bioinform..

[10]  Caleb Webber,et al.  Estimation of P-values for global alignments of protein sequences , 2001, Bioinform..

[11]  F. Denoeud,et al.  A tandem repeats database for bacterial genomes: application to the genotyping of Yersinia pestis and Bacillus anthracis , 2001, BMC Microbiology.

[12]  John M. Butler,et al.  STRBase: a short tandem repeat DNA database for the human identity testing community , 2001, Nucleic Acids Res..

[13]  P. Vandergheynst,et al.  Fourier and wavelet transform analysis, a tool for visualizing regular patterns in DNA sequences. , 2000, Journal of theoretical biology.

[14]  J. Jurka,et al.  Microsatellites in different eukaryotic genomes: survey and analysis. , 2000, Genome research.

[15]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[16]  L. Price,et al.  Multiple-Locus Variable-Number Tandem Repeat Analysis Reveals Genetic Relationships within Bacillus anthracis , 2000, Journal of bacteriology.

[17]  Philip Supply,et al.  Variable human minisatellite‐like regions in the Mycobacterium tuberculosis genome , 2000, Molecular microbiology.

[18]  P. Keim,et al.  Diversity in a Variable-Number Tandem Repeat fromYersinia pestis , 2000, Journal of Clinical Microbiology.

[19]  J. Jackson,et al.  Vectors of shannon information from fourier signals characterizing base periodicity in genes and genomes. , 2000, Biochemical and biophysical research communications.

[20]  Y. Kashi,et al.  Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism. , 2000, Genome research.

[21]  Latent Periodicity of Protein Sequences , 1999 .

[22]  E V Korotkov,et al.  Method revealing latent periodicity of the nucleotide sequences modified for a case of small samples. , 1999, DNA research : an international journal for rapid publication of reports on genes and genomes.

[23]  G. Benson,et al.  Tandem repeats finder: a program to analyze DNA sequences. , 1999, Nucleic acids research.

[24]  Jeanette P. Schmidt,et al.  All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings , 1998, SIAM J. Comput..

[25]  R. Frothingham,et al.  Genetic diversity in the Mycobacterium tuberculosis complex based on variable numbers of tandem DNA repeats. , 1998, Microbiology.

[26]  V R Chechetkin,et al.  Nucleosome units and hidden periodicities in DNA sequences. , 1998, Journal of biomolecular structure & dynamics.

[27]  A. van Belkum,et al.  UvA-DARE ( Digital Academic Repository ) Variable number of tandem repeats in clinical strains of Haemophilus influenzae , 1997 .

[28]  K. Woodford,et al.  DNA Secondary Structures and the Evolution of Hypervariable Tandem Arrays* , 1997, The Journal of Biological Chemistry.

[29]  Eugene V. Korotkov,et al.  Latent sequence periodicity of some oncogenes and DNA-binding protein genes , 1997, Comput. Appl. Biosci..

[30]  Susan R. Wilson INTRODUCTION TO COMPUTATIONAL BIOLOGY: MAPS, SEQUENCES AND GENOMES. , 1996 .

[31]  Sampath Kannan,et al.  An Algorithm for Locating Nonoverlapping Regions of Maximum Alignment Score , 1996, SIAM J. Comput..

[32]  Cécile Fizames,et al.  A comprehensive genetic map of the human genome based on 5,264 microsatellites , 1996, Nature.

[33]  Eric S. Lander,et al.  A comprehensive genetic map of the mouse genome , 1996, Nature.

[34]  R. Wells Molecular Basis of Genetic Instability of Triplet Repeats (*) , 1996, The Journal of Biological Chemistry.

[35]  Gary Benson A Space Efficient Algorithm for Finding the Best Nonoverlapping Alignment Score , 1995, Theor. Comput. Sci..

[36]  Michael S. Waterman,et al.  Introduction to computational biology , 1995 .

[37]  R. Richards,et al.  Fragile X syndrome unstable element, p(CCG)n, and other simple tandem repeat sequences are binding sites for specific nuclear proteins. , 1993, Human molecular genetics.

[38]  Aleksandar Milosavljevic,et al.  Discovering simple DNA sequences by the algorithmic significance method , 1993, Comput. Appl. Biosci..

[39]  Gad M. Landau,et al.  An Algorithm for Approximate Tandem Repeats , 1993, CPM.

[40]  S. Elgin,et al.  (CT)n (GA)n repeats and heat shock elements have distinct roles in chromatin structure and transcriptional activation of the Drosophila hsp26 gene , 1993, Molecular and cellular biology.

[41]  P. Argos,et al.  Weighting aligned protein or nucleic acid sequences to correct for unequal representation. , 1990, Journal of molecular biology.

[42]  L. Coggins,et al.  DNA tertiary structures formed in vitro by misaligned hybridization of multiple tandem repeat sequences. , 1989, Nucleic acids research.

[43]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[44]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[45]  Information decomposition of symbolic sequences , 2003, math/0302195.

[46]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .