Proteins comparison through probabilistic optimal structure local alignment

Multiple local structure comparison helps to identify common structural motifs or conserved binding sites in 3D structures in distantly related proteins. Since there is no best way to compare structures and evaluate the alignment, a wide variety of techniques and different similarity scoring schemes have been proposed. Existing algorithms usually compute the best superposition of two structures or attempt to solve it as an optimization problem in a simpler setting (e.g., considering contact maps or distance matrices). Here, we present PROPOSAL (PROteins comparison through Probabilistic Optimal Structure local ALignment), a stochastic algorithm based on iterative sampling for multiple local alignment of protein structures. Our method can efficiently find conserved motifs across a set of protein structures. Only the distances between all pairs of residues in the structures are computed. To show the accuracy and the effectiveness of PROPOSAL we tested it on a few families of protein structures. We also compared PROPOSAL with two state-of-the-art tools for pairwise local alignment on a dataset of manually annotated motifs. PROPOSAL is available as a Java 2D standalone application or a command line program at http://ferrolab.dmi.unict.it/proposal/proposal.html.

[1]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[2]  Concettina Guerra,et al.  A global optimization algorithm for protein surface alignment , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop.

[3]  Piero Fariselli,et al.  Fast overlapping of protein contact maps by alignment of eigenvectors , 2010, Bioinform..

[4]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[5]  Christian Hofbauer,et al.  SURFCOMP: A Novel Graph-Based Approach to Molecular Surface Comparison , 2004, J. Chem. Inf. Model..

[6]  Lydia E. Kavraki,et al.  The LabelHash algorithm for substructure matching , 2010, BMC Bioinformatics.

[7]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[8]  M. Jambon,et al.  A new bioinformatic approach to detect common 3D sites in protein structures , 2003, Proteins.

[9]  Robert B. Russell,et al.  Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures , 2003, Nucleic Acids Res..

[10]  Donald R. Jones,et al.  Direct Global Optimization Algorithm , 2009, Encyclopedia of Optimization.

[11]  Bonnie Berger,et al.  Optimal contact map alignment of protein–protein interfaces , 2008, Bioinform..

[12]  Lydia E. Kavraki,et al.  The LabelHash Server and Tools for substructure-based functional annotation , 2011, Bioinform..

[13]  D Fischer,et al.  A computer vision based technique for 3-D sequence-independent structural comparison of proteins. , 1993, Protein engineering.

[14]  Philip E. Bourne,et al.  A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery , 2009, Bioinform..

[15]  Rafael Najmanovich,et al.  Detection of 3 D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008 .

[16]  Gunnar W. Klau,et al.  PAUL: protein structural alignment using integer linear programming and Lagrangian relaxation , 2009, BMC Bioinformatics.

[17]  Takeshi Kawabata,et al.  MATRAS: a program for protein 3D structure comparison , 2003, Nucleic Acids Res..

[18]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Pu Liu,et al.  Fast determination of the optimal rotational matrix for macromolecular superpositions , 2009, J. Comput. Chem..

[20]  Dusanka Janezic,et al.  ProBiS-2012: web server and web services for detection of structurally similar binding sites in proteins , 2012, Nucleic Acids Res..

[21]  Eyke Hüllermeier,et al.  Multiple Graph Alignment for the Structural Analysis of Protein Active Sites , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[23]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[24]  Jun S. Liu,et al.  Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. , 1993, Science.

[25]  E. Webb,et al.  Enzyme nomenclature. Recommendations 1984. Supplement 2: corrections and additions. , 1989, European journal of biochemistry.

[26]  Thomas Chesney Searching for Patterns , 2011 .

[27]  Lei Xie,et al.  Detecting evolutionary relationships across existing fold space, using sequence order-independent profile–profile alignments , 2008, Proceedings of the National Academy of Sciences.

[28]  Liisa Holm,et al.  Searching protein structure databases with DaliLite v.3 , 2008, Bioinform..

[29]  Gail J. Bartlett,et al.  Using a library of structural templates to recognise catalytic sites and explore their evolution in homologous families. , 2005, Journal of molecular biology.

[30]  Janet M. Thornton,et al.  Detection of 3D atomic similarities and their use in the discrimination of small molecule protein-binding sites , 2008, ECCB.

[31]  Giovanni Micale,et al.  GASOLINE: a Greedy And Stochastic algorithm for Optimal Local multiple alignment of Interaction NEtworks , 2014, PloS one.

[32]  Dusanka Janezic,et al.  ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment , 2010, Bioinform..

[33]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[34]  Ruth Nussinov,et al.  The Multiple Common Point Set Problem and Its Application to Molecule Binding Pattern Detection , 2006, J. Comput. Biol..

[35]  Robert D. Carr,et al.  101 optimal PDB structure alignments: a branch-and-cut algorithm for the maximum contact map overlap problem , 2001, RECOMB.

[36]  Janet M. Thornton,et al.  The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes , 2013, Nucleic Acids Res..

[37]  P. Willett,et al.  Searching for Patterns of Amino Acids in 3D Protein Structures. , 2003 .

[38]  Liisa Holm,et al.  DaliLite workbench for protein structure comparison , 2000, Bioinform..

[39]  Lydia E. Kavraki,et al.  The MASH Pipeline for Protein Function Prediction and an Algorithm for the Geometric Refinement of 3D Motifs , 2007, J. Comput. Biol..

[40]  J. Jung,et al.  Protein structure alignment using environmental profiles. , 2000, Protein engineering.

[41]  J. Snoeyink,et al.  Distance-based identification of structure motifs in proteins using constrained frequent subgraph mining. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[42]  J. Thornton,et al.  Tess: A geometric hashing algorithm for deriving 3D coordinate templates for searching structural databases. Application to enzyme active sites , 1997, Protein science : a publication of the Protein Society.

[43]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[44]  William R. Taylor,et al.  Structure Comparison and Structure Patterns , 2000, J. Comput. Biol..

[45]  Ruth Nussinov,et al.  MultiBind and MAPPIS: webservers for multiple alignment of protein 3D-binding sites and their interactions , 2008, Nucleic Acids Res..

[46]  H. Wolfson,et al.  Spatial chemical conservation of hot spot interactions in protein-protein complexes , 2007, BMC Biology.

[47]  David Baker,et al.  Motif‐directed flexible backbone design of functional interactions , 2009, Protein science : a publication of the Protein Society.

[48]  Mary Ellen Bock,et al.  MolLoc: a web tool for the local structural alignment of molecular surfaces , 2009, Nucleic Acids Res..

[49]  G. Crooks,et al.  WebLogo: a sequence logo generator. , 2004, Genome research.

[50]  M. Vassura,et al.  Reconstruction of 3D Structures From Protein Contact Maps , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.