Rapid motif-based prediction of circular permutations in multi-domain proteins

MOTIVATION Rearrangements of protein domains and motifs such as swaps and circular permutations (CPs) can produce erroneous results in searching sequence databases when using traditional methods based on linear sequence alignments. Circular permutations are also of biological relevance because they can help to better understand both protein evolution and functionality. RESULTS We have developed an algorithm, RASPODOM, which is based on the classical recursive alignment scheme. Sequences are represented as strings of domains taken from precompiled resources of domain (motif) databases such as ProDom. The algorithm works several orders of magnitude faster than a reimplementation of the existing CP detection algorithm working on strings of amino acids, produces virtually no false positives and allows the discrimination of true CPs from 'intermediate' CPs (iCPs). Several true CPs which have not been reported in literature so far could be identified from Swiss-Prot/TrEMBL within minutes.

[1]  W R Taylor,et al.  Three-dimensional domain duplication, swapping and stealing. , 1997, Current opinion in structural biology.

[2]  G. Schneider,et al.  Circular permutations of natural protein sequences: structural evidence. , 1997, Current opinion in structural biology.

[3]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[4]  C. Weston,et al.  The unusual transhydrogenase of Entamoeba histolytica , 2001, FEBS letters.

[5]  T. Creighton,et al.  Circular and circularly permuted forms of bovine pancreatic trypsin inhibitor. , 1983, Journal of molecular biology.

[6]  Amihood Amir,et al.  A simple algorithm for detecting circular permutations in proteins , 1999, Bioinform..

[7]  J. Söding,et al.  More than the sum of their parts: On the evolution of proteins from peptides , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[8]  Terri K. Attwood,et al.  The PRINTS Database: A Resource for Identification of Protein Families , 2002, Briefings Bioinform..

[9]  H. Hilbert,et al.  Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. , 1996, Nucleic acids research.

[10]  S Uliel,et al.  Naturally occurring circular permutations in proteins. , 2001, Protein engineering.

[11]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[12]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[13]  S. Salzberg,et al.  Evidence for lateral gene transfer between Archaea and Bacteria from genome sequence of Thermotoga maritima , 1999, Nature.

[14]  E. Sonnhammer,et al.  Modular arrangement of proteins as inferred from analysis of homology , 1994, Protein science : a publication of the Protein Society.

[15]  R B Russell,et al.  Swaposins: circular permutations within genes encoding saposin homologues. , 1995, Trends in biochemical sciences.

[16]  Shmuel Pietrokovski,et al.  Increased coverage of protein families with the Blocks Database servers , 2000, Nucleic Acids Res..

[17]  Janusz M Bujnicki,et al.  Sequence permutations in the molecular evolution of DNA methyltransferases , 2002, BMC Evolutionary Biology.

[18]  P Bork,et al.  Evolutionarily mobile modules in proteins. , 1993, Scientific American.

[19]  G. C. Ferreira,et al.  Circular Permutation of 5-Aminolevulinate Synthase , 2001, The Journal of Biological Chemistry.

[20]  Albert Jeltsch,et al.  Circular Permutations in the Molecular Evolution of DNA Methyltransferases , 1999, Journal of Molecular Evolution.

[21]  J Schultz,et al.  SMART, a simple modular architecture research tool: identification of signaling domains. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[22]  J. Jung,et al.  Circularly permuted proteins in the protein structure database , 2001, Protein science : a publication of the Protein Society.

[23]  R. Glockshuber,et al.  Random circular permutation of DsbA reveals segments that are essential for protein folding and stability. , 1999, Journal of molecular biology.

[24]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[25]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[26]  C. Chothia,et al.  Structure, function and evolution of multidomain proteins. , 2004, Current opinion in structural biology.