RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

[1]  J Heringa,et al.  Detection of internal repeats: how common are they? , 1998, Current opinion in structural biology.

[2]  Daniel B. Roche,et al.  TAPO: A combined method for the identification of tandem repeats in protein structures , 2015, FEBS letters.

[3]  Manfred J. Sippl,et al.  Detecting Repetitions and Periodicities in Proteins by Tiling the Structural Space , 2013, The journal of physical chemistry. B.

[4]  Daniel B. Roche,et al.  Classification of β-hairpin repeat proteins. , 2017, Journal of structural biology.

[5]  Ian Sillitoe,et al.  CATH: expanding the horizons of structure-based functional annotations for genome sequences , 2018, Nucleic Acids Res..

[6]  A. Bateman,et al.  Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases , 2019, Nucleic acids research.

[7]  Lisa D. Cabrita,et al.  Systematic mapping of free energy landscapes of a growing filamin domain during biosynthesis , 2018, Proceedings of the National Academy of Sciences.

[8]  A. Elofsson,et al.  A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder , 2020, Genes.

[9]  Silvio C. E. Tosatto,et al.  Identification of repetitive units in protein structures with ReUPred , 2016, Amino Acids.

[10]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[11]  Lucy R Forrest,et al.  MemSTATS: A Benchmark Set of Membrane Protein Symmetries and Pseudo-Symmetries. , 2019, Journal of molecular biology.

[12]  Ezequiel A. Galpern,et al.  Large Ankyrin repeat proteins are formed with similar and energetically favorable units , 2020, PloS one.

[13]  Silvio C. E. Tosatto,et al.  RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures , 2012, Bioinform..

[14]  B. Kobe,et al.  When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. , 2000, Trends in biochemical sciences.

[15]  Silvio C. E. Tosatto,et al.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations , 2018, Nucleic Acids Res..

[16]  D. Barford,et al.  Topological characteristics of helical repeat proteins. , 1999, Current opinion in structural biology.

[17]  Radka Svobodová Vařeková,et al.  LiteMol suite: interactive web-based visualization of large-scale macromolecular structure data , 2017, Nature Methods.

[18]  Maria W. Górna,et al.  Self-analysis of repeat proteins reveals evolutionarily conserved patterns , 2020, BMC Bioinformatics.

[19]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[20]  Andrey V Kajava,et al.  Tandem repeats in proteins: from sequence to structure. , 2012, Journal of structural biology.

[21]  Silvio C. E. Tosatto,et al.  RepeatsDB: a database of tandem repeat protein structures , 2013, Nucleic Acids Res..

[22]  Damiano Piovesan,et al.  The Feature-Viewer: a visualization tool for positional annotations on a sequence , 2020, Bioinform..

[23]  David S. Goodsell,et al.  RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy , 2018, Nucleic Acids Res..

[24]  Maria Jesus Martin,et al.  SIFTS: updated Structure Integration with Function, Taxonomy and Sequences resource allows 40-fold increase in coverage of structure-based annotations for proteins , 2018, Nucleic Acids Res..

[25]  Andreas Prlić,et al.  Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm , 2018, bioRxiv.

[26]  Bioschemas Community Bioschemas: From Potato Salad to Protein Annotation , 2017, ISWC 2017.

[27]  Damiano Piovesan,et al.  A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication. , 2020, Journal of structural biology.

[28]  Adam Godzik,et al.  ConSole: using modularity of Contact maps to locate Solenoid domains in protein structures , 2014, BMC Bioinformatics.

[29]  Silvio C. E. Tosatto,et al.  RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins , 2018, Nucleic Acids Res..

[30]  William R Taylor,et al.  A Fourier analysis of symmetry in protein structure. , 2002, Protein engineering.

[31]  Denise Gorse,et al.  Wavelet transforms for the characterization and detection of repeating motifs. , 2002, Journal of molecular biology.

[32]  The UniProt Consortium,et al.  UniProt: a worldwide hub of protein knowledge , 2018, Nucleic Acids Res..

[33]  William R Taylor,et al.  Toward the detection and validation of repeats in protein structure , 2004, Proteins.

[34]  Silvio C. E. Tosatto,et al.  The Pfam protein families database in 2019 , 2018, Nucleic Acids Res..

[35]  J. Gough,et al.  The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures , 2019, Nucleic Acids Res..

[36]  Carole A. Goble,et al.  Bioschemas: From Potato Salad to Protein Annotation , 2017, SEMWEB.