LNA: Fast Protein Structural Comparison Using a Laplacian Characterization of Tertiary Structure

In the last two decades, a lot of protein 3D shapes have been discovered, characterized, and made available thanks to the Protein Data Bank (PDB), that is nevertheless growing very quickly. New scalable methods are thus urgently required to search through the PDB efficiently. This paper presents an approach entitled LNA (Laplacian Norm Alignment) that performs a structural comparison of two proteins with dynamic programming algorithms. This is achieved by characterizing each residue in the protein with scalar features. The feature values are calculated using a Laplacian operator applied on the graph corresponding to the adjacency matrix of the residues. The weighted Laplacian operator we use estimates, at various scales, local deformations of the topology where each residue is located. On some benchmarks, which are widely shared by the community, we obtain qualitatively similar results compared to other competing approaches, but with an algorithm one or two order of magnitudes faster. 180,000 protein comparisons can be done within 1 second with a single recent Graphical Processing Unit (GPU), which makes our algorithm very scalable and suitable for real-time database querying across the web.

[1]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[2]  J. Jung,et al.  Protein structure alignment using environmental profiles. , 2000, Protein engineering.

[3]  W R Taylor,et al.  SSAP: sequential structure alignment program for protein structure comparison. , 1996, Methods in enzymology.

[4]  Zhiping Weng,et al.  FAST: A novel protein structure alignment algorithm , 2004, Proteins.

[5]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[6]  Robert D. Carr,et al.  1001 Optimal PDB Structure Alignments: Integer Programming Methods for Finding the Maximum Contact Map Overlap , 2004, J. Comput. Biol..

[7]  Gavin C. Cawley,et al.  On a Fast, Compact Approximation of the Exponential Function , 2000, Neural Computation.

[8]  Ruth Nussinov,et al.  GOSSIP: a method for fast and accurate global alignment of protein structures , 2011, Bioinform..

[9]  Aaftab Munshi,et al.  The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).

[10]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[11]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[12]  Markus Gruber,et al.  COPS—a novel workbench for explorations in fold space , 2009, Nucleic Acids Res..

[13]  Christos H. Papadimitriou,et al.  Algorithmic aspects of protein structure similarity , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[14]  Jeffrey Skolnick,et al.  Fr-TM-align: a new protein structural alignment method based on fragment alignments and the TM-score , 2008, BMC Bioinformatics.

[15]  Mark Meyer,et al.  Discrete Differential-Geometry Operators for Triangulated 2-Manifolds , 2002, VisMath.

[16]  Karl Frank,et al.  COPS Benchmark: interactive analysis of database search methods , 2010, Bioinform..

[17]  Osvaldo Olmea,et al.  MAMMOTH (Matching molecular models obtained from theory): An automated method for model comparison , 2002, Protein science : a publication of the Protein Society.

[18]  Ian Sillitoe,et al.  Extending CATH: increasing coverage of the protein structure universe and linking structure with function , 2010, Nucleic Acids Res..

[19]  J. Skolnick,et al.  TM-align: a protein structure alignment algorithm based on the TM-score , 2005, Nucleic acids research.

[20]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[21]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[22]  W. Delano The PyMOL Molecular Graphics System , 2002 .

[23]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[24]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[25]  Radu Horaud,et al.  Articulated shape matching using Laplacian eigenfunctions and unsupervised point registration , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Joël Pothier,et al.  YAKUSA: A fast structural database scanning method , 2005, Proteins.

[27]  Piero Fariselli,et al.  Fast overlapping of protein contact maps by alignment of eigenvectors , 2010, Bioinform..

[28]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[29]  Leo Grady,et al.  Discrete Calculus - Applied Analysis on Graphs for Computational Science , 2010 .

[30]  Chi-Ching Lee,et al.  iSARST: an integrated SARST web server for rapid protein structural similarity searches , 2009, Nucleic Acids Res..

[31]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[32]  Patrice Koehl,et al.  The ASTRAL Compendium in 2004 , 2003, Nucleic Acids Res..

[33]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[34]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[35]  Chih-Hung Chang,et al.  Protein structural similarity search by Ramachandran codes , 2007, BMC Bioinformatics.

[36]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[37]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[38]  Jinn-Moon Yang,et al.  Protein structure database search and evolutionary classification , 2006, Nucleic acids research.

[39]  Rumen Andonov,et al.  An Efficient Lagrangian Relaxation for the Contact Map Overlap Problem , 2008, WABI.