RepeatsDB: a database of tandem repeat protein structures

RepeatsDB (http://repeatsdb.bio.unipd.it/) is a database of annotated tandem repeat protein structures. Tandem repeats pose a difficult problem for the analysis of protein structures, as the underlying sequence can be highly degenerate. Several repeat types haven been studied over the years, but their annotation was done in a case-by-case basis, thus making large-scale analysis difficult. We developed RepeatsDB to fill this gap. Using state-of-the-art repeat detection methods and manual curation, we systematically annotated the Protein Data Bank, predicting 10 745 repeat structures. In all, 2797 structures were classified according to a recently proposed classification schema, which was expanded to accommodate new findings. In addition, detailed annotations were performed in a subset of 321 proteins. These annotations feature information on start and end positions for the repeat regions and units. RepeatsDB is an ongoing effort to systematically classify and annotate structural protein repeats in a consistent way. It provides users with the possibility to access and download high-quality datasets either interactively or programmatically through web services.

[1]  J. Söding,et al.  Evolution of outer membrane beta-barrels from an ancestral beta beta hairpin. , 2010, Molecular biology and evolution.

[2]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[3]  Joël Pothier,et al.  Swelfe: a detector of internal repeats in sequences and structures , 2008, Bioinform..

[4]  Manfred J. Sippl,et al.  Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space , 2013, The journal of physical chemistry. B.

[5]  Hong Luo,et al.  ProRepeat: an integrated repository for studying amino acid tandem repeats in proteins , 2011, Nucleic Acids Res..

[6]  Robert D. Finn,et al.  The challenge of increasing Pfam coverage of the human proteome , 2013, Database J. Biol. Databases Curation.

[7]  Massimo Paoli,et al.  Novel sequences propel familiar folds. , 2002, Structure.

[8]  Alain Hauser,et al.  Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences , 2012, Nucleic acids research.

[9]  David S. Goodsell,et al.  The RCSB Protein Data Bank: new resources for research and education , 2012, Nucleic Acids Res..

[10]  John C. Wootton,et al.  Non-globular Domains in Protein Sequences: Automated Segmentation Using Complexity Measures , 1994, Comput. Chem..

[11]  Sophie E Jackson,et al.  A recurring theme in protein engineering: the design, stability and folding of repeat proteins. , 2005, Current opinion in structural biology.

[12]  William R Taylor,et al.  Toward the detection and validation of repeats in protein structure , 2004, Proteins.

[13]  P. McEwan,et al.  The leucine-rich repeat structure , 2008, Cellular and Molecular Life Sciences.

[14]  Andreas Plückthun,et al.  DARPins recognizing the tumor-associated antigen EpCAM selected by phage and ribosome display and engineered for multivalency. , 2011, Journal of molecular biology.

[15]  Peer Bork,et al.  SMART 7: recent updates to the protein domain annotation resource , 2011, Nucleic Acids Res..

[16]  David A. Lee,et al.  PSI-2: structural genomics to cover protein domain family space. , 2009, Structure.

[17]  Andrey V Kajava,et al.  PRDB: Protein Repeat DataBase , 2012, Proteomics.

[18]  Maria Jesus Martin,et al.  BioJS: an open source JavaScript framework for biological data visualization , 2013, Bioinform..

[19]  David A. Lee,et al.  New functional families (FunFams) in CATH to improve the mapping of conserved functional sites to 3D structures , 2012, Nucleic Acids Res..

[20]  Andrey V Kajava,et al.  Protein homorepeats sequences, structures, evolution, and functions. , 2010, Advances in Protein Chemistry and Structural Biology.

[21]  B. Kobe,et al.  When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. , 2000, Trends in biochemical sciences.

[22]  Laura S Itzhaki,et al.  Tandem-repeat proteins: regularity plus modularity equals design-ability. , 2013, Current opinion in structural biology.

[23]  G Vergnaud,et al.  Complex recombination events at the hypermutable minisatellite CEB1 (D2S90). , 1994, The EMBO journal.

[24]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[25]  H. Berman,et al.  The future of the Protein Data Bank. , 2013, Biopolymers.

[26]  Jaap Heringa,et al.  Tracking repeats using significance and transitivity , 2004, ISMB/ECCB.

[27]  B. Kobe,et al.  The leucine-rich repeat as a protein recognition motif. , 2001, Current opinion in structural biology.

[28]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Andrey V Kajava,et al.  Beta-rolls, beta-helices, and other beta-solenoid proteins. , 2006, Advances in protein chemistry.

[30]  P Bork,et al.  Comparison of ARM and HEAT protein repeats. , 2001, Journal of molecular biology.

[31]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[32]  Andrey V Kajava,et al.  Tandem repeats in proteins: from sequence to structure. , 2012, Journal of structural biology.

[33]  E. Bailes,et al.  Armadillo-repeat protein functions: questions for little creatures. , 2010, Trends in cell biology.

[34]  Silvio C. E. Tosatto,et al.  RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures , 2012, Bioinform..

[35]  Alasdair C. Steven,et al.  β‐Rolls, β‐Helices, and Other β‐Solenoid Proteins , 2006 .

[36]  Johannes Söding,et al.  De novo identification of highly diverged protein repeats by probabilistic consistency , 2008, Bioinform..

[37]  L. Luo,et al.  Role of leucine-rich repeat proteins in the development and function of neural circuits. , 2011, Annual review of cell and developmental biology.

[38]  Arne Elofsson,et al.  Expansion of Protein Domain Repeats , 2006, PLoS Comput. Biol..

[39]  Ashley M. Buckle,et al.  PolyQ: a database describing the sequence and domain context of polyglutamine repeats in proteins , 2010, Nucleic Acids Res..

[40]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[41]  Silvio C. E. Tosatto,et al.  MobiDB: a comprehensive database of intrinsic protein disorder annotations , 2012, Bioinform..

[42]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[43]  A G Murzin,et al.  Structure and distribution of pentapeptide repeats in bacteria , 1998, Protein science : a publication of the Protein Society.

[44]  Silvio C. E. Tosatto,et al.  REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform , 2009, Bioinform..

[45]  Liisa Holm,et al.  Rapid automatic detection and alignment of repeats in protein sequences , 2000, Proteins.

[46]  A. Steven,et al.  New HEAT-like repeat motifs in proteins regulating proteasome structure and function. , 2004, Journal of structural biology.