An Algorithm to Find All Identical Motifs in Multiple Biological Sequences

Sequence motifs are of greater biological importance in nucleotide and protein sequences. The conserved occurrence of identical motifs represents the functional significance and helps to classify the biological sequences. In this paper, a new algorithm is proposed to find all identical motifs in multiple nucleotide or protein sequences. The proposed algorithm uses the concept of dynamic programming. The application of this algorithm includes the identification of (a) conserved identical sequence motifs and (b) identical or direct repeat sequence motifs across multiple biological sequences (nucleotide or protein sequences). Further, the proposed algorithm facilitates the analysis of comparative internal sequence repeats for the evolutionary studies which helps to derive the phylogenetic relationships from the distribution of repeats.

[1]  T. Werner Models for prediction and recognition of eukaryotic promoters , 1999, Mammalian Genome.

[2]  Amos Bairoch,et al.  ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins , 2006, Nucleic Acids Res..

[3]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[4]  J Heringa,et al.  Detection of internal repeats: how common are they? , 1998, Current opinion in structural biology.

[5]  Michael Y. Galperin,et al.  Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea , 1997, Molecular microbiology.

[6]  P. Djian,et al.  Evolution of Simple Repeats in DNA and Their Relation to Human Disease , 1998, Cell.

[7]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[8]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[9]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[10]  T. Gibson,et al.  Systematic Discovery of New Recognition Peptides Mediating Protein Interaction Networks , 2005, PLoS biology.

[11]  Krishna Sekar,et al.  ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures , 2010, Comput. Biol. Chem..

[12]  E. Kabat,et al.  Long identical repeats in the mouse gamma 2b switch region and their implications for the mechanism of class switching. , 1984, EMBO Journal.

[13]  P. D’haeseleer What are DNA sequence motifs? , 2006, Nature Biotechnology.

[14]  F. J. Mojica,et al.  Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria , 2000, Molecular microbiology.

[15]  T. Yamane,et al.  Large cryptic internal sequence repeats in protein structures from Homo sapiens , 2009, Journal of Biosciences.

[16]  K. Sekar,et al.  An algorithm to find all identical internal sequence repeats , 2008 .

[17]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[18]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[19]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[20]  A. Valencia,et al.  Beta-propellers: associated functions and their role in human diseases. , 2003, Current medicinal chemistry.

[21]  Narayanaswamy Balakrishnan,et al.  A Method to Find Sequentially Separated Motifs in Biological Sequences (SSMBS) , 2008, PRIB.

[22]  T. Boby,et al.  TRbase: a database relating tandem repeats to disease genes for the human genome , 2005, Bioinform..

[23]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[24]  Douglas L. Brutlag,et al.  The EMOTIF database , 2001, Nucleic Acids Res..

[25]  Liane Gagnier,et al.  Genomic deletions and precise removal of transposable elements mediated by short identical DNA segments in primates. , 2005, Genome research.

[26]  Mikhail S. Gelfand,et al.  A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length , 2005, Bioinform..

[27]  M. Fenner,et al.  CRISPR--a widespread system that provides acquired resistance against phages in bacteria and archaea. , 2007 .

[28]  An algorithm to find similar internal sequence repeats , 2009 .

[29]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.