RepeatsDB-lite: a web server for unit annotation of tandem repeat proteins

Abstract RepeatsDB-lite (http://protein.bio.unipd.it/repeatsdb-lite) is a web server for the prediction of repetitive structural elements and units in tandem repeat (TR) proteins. TRs are a widespread but poorly annotated class of non-globular proteins carrying heterogeneous functions. RepeatsDB-lite extends the prediction to all TR types and strongly improves the performance both in terms of computational time and accuracy over previous methods, with precision above 95% for solenoid structures. The algorithm exploits an improved TR unit library derived from the RepeatsDB database to perform an iterative structural search and assignment. The web interface provides tools for analyzing the evolutionary relationships between units and manually refine the prediction by changing unit positions and protein classification. An all-against-all structure-based sequence similarity matrix is calculated and visualized in real-time for every user edit. Reviewed predictions can be submitted to RepeatsDB for review and inclusion.

[1]  D. Eisenberg,et al.  A census of protein repeats. , 1999, Journal of molecular biology.

[2]  Silvio C. E. Tosatto,et al.  Comparison of protein repeat classifications based on structure and sequence families. , 2015, Biochemical Society transactions.

[3]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[4]  Adam Godzik,et al.  ConSole: using modularity of Contact maps to locate Solenoid domains in protein structures , 2014, BMC Bioinformatics.

[5]  Silvio C. E. Tosatto,et al.  RepeatsDB: a database of tandem repeat protein structures , 2013, Nucleic Acids Res..

[6]  Silvio C. E. Tosatto,et al.  Identification of repetitive units in protein structures with ReUPred , 2016, Amino Acids.

[7]  B. Kobe,et al.  When protein folding is simplified to protein coiling: the continuum of solenoid protein structures. , 2000, Trends in biochemical sciences.

[8]  Peng Sun,et al.  Bi-Force: large-scale bicluster editing and its application to gene expression data biclustering , 2014, Nucleic acids research.

[9]  Andrey V Kajava,et al.  Tandem repeats in proteins: from sequence to structure. , 2012, Journal of structural biology.

[10]  Daniel B. Roche,et al.  Classification of β-hairpin repeat proteins. , 2017, Journal of structural biology.

[11]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[12]  A. Konagurthu,et al.  MUSTANG: A multiple structural alignment algorithm , 2006, Proteins.

[13]  Haruki Nakamura,et al.  Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive. , 2017, Methods in molecular biology.

[14]  O. Gascuel,et al.  Deep Conservation of Human Protein Tandem Repeats within the Eukaryotes , 2014, Molecular biology and evolution.

[15]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[16]  Andrey V. Kajava,et al.  What Curves α-Solenoids? , 2002, The Journal of Biological Chemistry.

[17]  H. Holden,et al.  Structural studies of AntD: an N-Acyltransferase involved in the biosynthesis of D-Anthrose. , 2012, Biochemistry.

[18]  A. Kajava What curves alpha-solenoids? Evidence for an alpha-helical toroid structure of Rpn1 and Rpn2 proteins of the 26 S proteasome. , 2002, The Journal of biological chemistry.

[19]  Daniel B. Roche,et al.  TAPO: A combined method for the identification of tandem repeats in protein structures , 2015, FEBS letters.

[20]  Alain Hauser,et al.  Repeat or not repeat?—Statistical validation of tandem repeat prediction in genomic sequences , 2012, Nucleic acids research.