Representing and comparing protein folds and fold families using three‐dimensional shape‐density representations

The question of how best to compare and classify the (three‐dimensional) structures of proteins is one of the most important unsolved problems in computational biology. To help tackle this problem, we have developed a novel shape‐density superposition algorithm called 3D‐Blast which represents and superposes the shapes of protein backbone folds using the spherical polar Fourier correlation technique originally developed by us for protein docking. The utility of this approach is compared with several well‐known protein structure alignment algorithms using receiver‐operator‐characteristic plots of queries against the “gold standard” CATH database. Despite being completely independent of protein sequences and using no information about the internal geometry of proteins, our results from searching the CATH database show that 3D‐Blast is highly competitive compared to current state‐of‐the‐art protein structure alignment algorithms. A novel and potentially very useful feature of our approach is that it allows an average or “consensus” fold to be calculated easily for a given group of protein structures. We find that using consensus shapes to represent entire fold families also gives very good database query performance. We propose that using the notion of consensus fold shapes could provide a powerful new way to index existing protein structure databases, and that it offers an objective way to cluster and classify all of the currently known folds in the protein universe. Proteins 2012. © 2011 Wiley Periodicals, Inc.

[1]  Ian Sillitoe,et al.  The CATH classification revisited—architectures reviewed and new ways to characterize structural divergence in superfamilies , 2008, Nucleic Acids Res..

[2]  Bin Li,et al.  Fast protein tertiary structure retrieval based on global surface shape similarity , 2008, Proteins.

[3]  Reinhard Klein,et al.  Shape retrieval using 3D Zernike descriptors , 2004, Comput. Aided Des..

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  David W. Ritchie,et al.  Ultra-fast FFT protein docking on graphics processors , 2010, Bioinform..

[6]  Johannes Söding,et al.  A galaxy of folds , 2009, Protein science : a publication of the Protein Society.

[7]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[10]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[11]  Lora Mak,et al.  An extension of spherical harmonics to region-based rotationally invariant descriptors for molecular shape description and comparison. , 2008, Journal of molecular graphics & modelling.

[12]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[13]  Lazaros Mavridis,et al.  Pacific Symposium on Biocomputing 15:281-292(2010) 3D-BLAST: 3D PROTEIN STRUCTURE ALIGNMENT, COMPARISON, AND CLASSIFICATION USING SPHERICAL POLAR FOURIER CORRELATIONS , 2022 .

[14]  D. Ritchie,et al.  Protein docking using spherical polar Fourier correlations , 2000, Proteins.

[15]  David Ritchie,et al.  High-order analytic translation matrix elements for real-space six-dimensional polar Fourier correlations , 2005 .

[16]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[17]  H. Dyson,et al.  Intrinsically unstructured proteins and their functions , 2005, Nature Reviews Molecular Cell Biology.

[18]  M Levitt,et al.  Comprehensive assessment of automatic structural alignment against a manual standard, the scop classification of proteins , 1998, Protein science : a publication of the Protein Society.

[19]  Liisa Holm,et al.  Advances and pitfalls of protein structural alignment. , 2009, Current opinion in structural biology.

[20]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[21]  C. Sander,et al.  Dali: a network tool for protein structure comparison. , 1995, Trends in biochemical sciences.

[22]  Manfred J Sippl,et al.  Fold space unlimited. , 2009, Current opinion in structural biology.

[23]  David W. Ritchie,et al.  Accelerating and focusing protein-protein docking correlations using multi-dimensional rotational FFT generating functions , 2008, Bioinform..

[24]  Manfred J. Sippl,et al.  A note on difficult structure alignment problems , 2008, Bioinform..

[25]  Benoit H. Dessailly,et al.  Exploiting structural classifications for function prediction: towards a domain grammar for protein function. , 2009, Current opinion in structural biology.

[26]  Ramon Carbo,et al.  How similar is a molecule to another? An electron density measure of similarity between two molecular structures , 1980 .

[27]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[28]  Gabrielle A. Reeves,et al.  Structural diversity of domain superfamilies in the CATH database. , 2006, Journal of molecular biology.

[29]  Benoit H. Dessailly,et al.  Exploring the structure and function paradigm. , 2008, Current opinion in structural biology.

[30]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[31]  Barry Honig,et al.  Is protein classification necessary? Toward alternative approaches to function annotation. , 2009, Current opinion in structural biology.