PRIGSA: Protein repeat identification by graph spectral analysis

Repetition of a structural motif within protein is associated with a wide range of structural and functional roles. In most cases the repeating units are well conserved at the structural level while at the sequence level, they are mostly undetectable suggesting the need for structure-based methods. Since most known methods require a training dataset, de novo approach is desirable. Here, we propose an efficient graph-based approach for detecting structural repeats in proteins. In a protein structure represented as a graph, interactions between inter- and intra-repeat units are well captured by the eigen spectra of adjacency matrix of the graph. These conserved interactions give rise to similar connections and a unique profile of the principal eigen spectra for each repeating unit. The efficacy of the approach is shown on eight repeat families annotated in UniProt, comprising of both solenoid and nonsolenoid repeats with varied secondary structure architecture and repeat lengths. The performance of the approach is also tested on other known benchmark datasets and the performance compared with two repeat identification methods. For a known repeat type, the algorithm also identifies the type of repeat present in the protein. A web tool implementing the algorithm is available at the URL http://bioinf.iiit.ac.in/PRIGSA/.

[1]  Markus Gruber,et al.  REPPER—repeats and their periodicities in fibrous proteins , 2005, Nucleic Acids Res..

[2]  Krishna Sekar,et al.  ProSTRIP: A method to find similar structural repeats in three-dimensional protein structures , 2010, Comput. Biol. Chem..

[3]  Andrey V Kajava,et al.  Tandem repeats in proteins: from sequence to structure. , 2012, Journal of structural biology.

[4]  O. Gascuel,et al.  Deep Conservation of Human Protein Tandem Repeats within the Eukaryotes , 2014, Molecular biology and evolution.

[5]  A. Kajava,et al.  Review: proteins with repeated sequence--structural prediction and modeling. , 2001, Journal of structural biology.

[6]  Bin Xue,et al.  Protein tandem repeats – the more perfect, the less structured , 2010, The FEBS journal.

[7]  Finn Drabløs,et al.  Detecting periodic patterns in biological sequences , 1998, Bioinform..

[8]  D. Barford,et al.  The structure of the tetratricopeptide repeats of protein phosphatase 5: implications for TPR‐mediated protein–protein interactions , 1998, The EMBO journal.

[9]  Manuel Simon,et al.  Designed ankyrin repeat proteins (DARPins) from research to therapy. , 2012, Methods in enzymology.

[10]  Liisa Holm,et al.  Rapid automatic detection and alignment of repeats in protein sequences , 2000, Proteins.

[11]  Laura S Itzhaki,et al.  Tandem-repeat proteins: regularity plus modularity equals design-ability. , 2013, Current opinion in structural biology.

[12]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[13]  John A. Robinson,et al.  Conformation-Dependent Recognition of HIV gp120 by Designed Ankyrin Repeat Proteins Provides Access to Novel HIV Entry Inhibitors , 2013, Journal of Virology.

[14]  Johannes Söding,et al.  De novo identification of highly diverged protein repeats by probabilistic consistency , 2008, Bioinform..

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  S. Vishveshwara,et al.  Identification of side-chain clusters in protein structures by a graph spectral method. , 1999, Journal of molecular biology.

[17]  Kevin Karplus,et al.  A Flexible Motif Search Technique Based on Generalized Profiles , 1996, Comput. Chem..

[18]  A. Mclachlan,et al.  The 14-fold periodicity in α-tropomyosin and the interaction with actin , 1976 .

[19]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[20]  William R Taylor,et al.  Toward the detection and validation of repeats in protein structure , 2004, Proteins.

[21]  Mark R. Cookson,et al.  The role of leucine-rich repeat kinase 2 (LRRK2) in Parkinson's disease , 2010, Nature Reviews Neuroscience.

[22]  E. Marcotte,et al.  A fast algorithm for genome‐wide analysis of proteins with repeated sequences , 1999, Proteins.

[23]  María Martín,et al.  The Universal Protein Resource (UniProt) in 2010 , 2010 .

[24]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[25]  Ming-Jing Hwang,et al.  OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures , 2006, Nucleic Acids Res..

[26]  Jaap Heringa,et al.  Tracking repeats using significance and transitivity , 2004, ISMB/ECCB.

[27]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Silvio C. E. Tosatto,et al.  REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform , 2009, Bioinform..

[29]  Silvio C. E. Tosatto,et al.  RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures , 2012, Bioinform..

[30]  Adam Godzik,et al.  ConSole: using modularity of Contact maps to locate Solenoid domains in protein structures , 2014, BMC Bioinformatics.

[31]  Andrey V. Kajava,et al.  T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm , 2009, Bioinform..

[32]  Silvio C. E. Tosatto,et al.  RepeatsDB: a database of tandem repeat protein structures , 2013, Nucleic Acids Res..

[33]  Joël Pothier,et al.  Swelfe: a detector of internal repeats in sequences and structures , 2008, Bioinform..

[34]  B. Chakrabarty,et al.  Analysis of graph centrality measures for identifying Ankyrin repeats , 2012, 2012 World Congress on Information and Communication Technologies.

[35]  Daniel C. Desrosiers,et al.  The ankyrin repeat as molecular architecture for protein recognition , 2004, Protein science : a publication of the Protein Society.

[36]  S Vishveshwara,et al.  Backbone cluster identification in proteins by a graph theoretical method. , 2000, Biophysical chemistry.

[37]  A. Giuliani,et al.  Protein contact networks: an emerging paradigm in chemistry. , 2013, Chemical reviews.

[38]  Saraswathi Vishveshwara,et al.  PROTEIN STRUCTURE: INSIGHTS FROM GRAPH THEORY , 2002 .

[39]  C. Ponting,et al.  Protein repeats: structures, functions, and evolution. , 2001, Journal of structural biology.

[40]  A. Plückthun,et al.  DARPin-targeting of measles virus: unique bispecificity, effective oncolysis, and enhanced safety. , 2013, Molecular therapy : the journal of the American Society of Gene Therapy.

[41]  Aaron M. Newman,et al.  XSTREAM: A practical algorithm for identification and architecture modeling of tandem repeats in protein sequences , 2007, BMC Bioinformatics.