Ab initio detection of fuzzy amino acid tandem repeats in protein sequences

BackgroundTandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multi-repeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function under evolutionary pressure. While there is keen interest in understanding such phenomena, detection of repeating structures based only on sequence analysis is considered an arduous task, since structure and function is often preserved even under considerable sequence divergence (fuzzy tandem repeats).ResultsIn this paper we present PTRStalker, a new algorithm for ab-initio detection of fuzzy tandem repeats in protein amino acid sequences. In the reported results we show that by feeding PTRStalker with amino acid sequences from the UniProtKB/Swiss-Prot database we detect novel tandemly repeated structures not captured by other state-of-the-art tools. Experiments with membrane proteins indicate that PTRStalker can detect global symmetries in the primary structure which are then reflected in the tertiary structure.ConclusionsPTRStalker is able to detect fuzzy tandem repeating structures in protein sequences, with performance beyond the current state-of-the art. Such a tool may be a valuable support to investigating protein structural properties when tertiary X-ray data is not available.

[1]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[2]  R. Dutzler,et al.  X-ray structure of a ClC chloride channel at 3.0 Å reveals the molecular basis of anion selectivity , 2002, Nature.

[3]  Ketan Mulmuley,et al.  Computational geometry - an introduction through randomized algorithms , 1993 .

[4]  R. Bloch,et al.  Muscle giants: molecular scaffolds in sarcomerogenesis. , 2009, Physiological reviews.

[5]  Liisa Holm,et al.  Rapid automatic detection and alignment of repeats in protein sequences , 2000, Proteins.

[6]  Gary Benson,et al.  Tandem repeats over the edit distance , 2007, Bioinform..

[7]  Lucian Ilie,et al.  Multiple spaced seeds for homology search , 2007, Bioinform..

[8]  Johannes Söding,et al.  HHrep: de novo protein repeat detection and the origin of TIM barrels , 2006, Nucleic Acids Res..

[9]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[10]  Jaap Heringa,et al.  Tracking repeats using significance and transitivity , 2004, ISMB/ECCB.

[11]  BMC Bioinformatics , 2005 .

[12]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[13]  Robert Krauthgamer,et al.  Detecting protein sequence conservation via metric embeddings , 2003, ISMB.

[14]  S. Karlin,et al.  Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Arne Elofsson,et al.  Expansion of Protein Domain Repeats , 2006, PLoS Comput. Biol..

[16]  Karl Popper,et al.  The REPRO server : finding protein internal sequence repeats through the Web , 2000 .

[17]  E. Marcotte,et al.  A fast algorithm for genome‐wide analysis of proteins with repeated sequences , 1999, Proteins.

[18]  Daniel P. Miranker,et al.  A metric model of amino acid substitution , 2004, Bioinform..

[19]  Alessio Vecchio,et al.  TRStalker: an efficient heuristic for finding fuzzy tandem repeats , 2010, Bioinform..

[20]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[21]  Juha Kärkkäinen,et al.  Better Filtering with Gapped q-Grams , 2001, Fundam. Informaticae.

[22]  Dinesh Gupta,et al.  ProtRepeatsDB: a database of amino acid repeats in genomes , 2006, BMC Bioinformatics.

[23]  Markus Gruber,et al.  REPPER—repeats and their periodicities in fibrous proteins , 2005, Nucleic Acids Res..

[24]  Wolfgang A Linke,et al.  Sense and stretchability: the role of titin and titin-associated proteins in myocardial stress-sensing and mechanical dysfunction. , 2007, Cardiovascular research.

[25]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[26]  Milton H Saier,et al.  The urea transporter (UT) family: bioinformatic analyses leading to structural, functional, and evolutionary predictions. , 2003, Receptors & channels.

[27]  Finn Drabløs,et al.  Detecting periodic patterns in biological sequences , 1998, Bioinform..

[28]  Erich E. Wanker,et al.  Detection of Alpha-Rod Protein Repeats Using a Neural Network and Application to Huntingtin , 2009, PLoS Comput. Biol..

[29]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[30]  Daniel P. Miranke Metric-space search in bioinformatics , 2010, SIGSPACIAL.

[31]  Bin Ma,et al.  Optimizing Multiple Spaced Seeds for Homology Search , 2004, CPM.

[32]  Johannes Söding,et al.  De novo identification of highly diverged protein repeats by probabilistic consistency , 2008, Bioinform..

[33]  Andrey V. Kajava,et al.  T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm , 2009, Bioinform..

[34]  Jun S. Liu,et al.  Gibbs motif sampling: Detection of bacterial outer membrane protein repeats , 1995, Protein science : a publication of the Protein Society.

[35]  W. Atchley,et al.  Solving the protein sequence metric problem. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  P. Argos,et al.  A method to recognize distant repeats in protein sequences , 1993, Proteins.

[37]  Bin Ma,et al.  PatternHunter: faster and more sensitive homology search , 2002, Bioinform..

[38]  Aaron M. Newman,et al.  XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences , 2007, BMC Bioinformatics.

[39]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .