Protein structure mining using a structural alphabet

We present a comprehensive evaluation of a new structure mining method called PB‐ALIGN. It is based on the encoding of protein structure as 1D sequence of a combination of 16 short structural motifs or protein blocks (PBs). PBs are short motifs capable of representing most of the local structural features of a protein backbone. Using derived PB substitution matrix and simple dynamic programming algorithm, PB sequences are aligned the same way amino acid sequences to yield structure alignment. PBs are short motifs capable of representing most of the local structural features of a protein backbone. Alignment of these local features as sequence of symbols enables fast detection of structural similarities between two proteins. Ability of the method to characterize and align regions beyond regular secondary structures, for example, N and C caps of helix and loops connecting regular structures, puts it a step ahead of existing methods, which strongly rely on secondary structure elements. PB‐ALIGN achieved efficiency of 85% in extracting true fold from a large database of 7259 SCOP domains and was successful in 82% cases to identify true super‐family members. On comparison to 13 existing structure comparison/mining methods, PB‐ALIGN emerged as the best on general ability test dataset and was at par with methods like YAKUSA and CE on nontrivial test dataset. Furthermore, the proposed method performed well when compared to flexible structure alignment method like FATCAT and outperforms in processing speed (less than 45 s per database scan). This work also establishes a reliable cut‐off value for the demarcation of similar folds. It finally shows that global alignment scores of unrelated structures using PBs follow an extreme value distribution. PB‐ALIGN is freely available on web server called Protein Block Expert (PBE) at http://bioinformatics.univ‐reunion.fr/PBE/. Proteins 2008. © 2007 Wiley‐Liss, Inc.

[1]  M. Tyagi,et al.  Local Protein Structures , 2007 .

[2]  A. D. McLachlan,et al.  Rapid comparison of protein structures , 1982 .

[3]  M. Murthy,et al.  Protein structural homology: a metric approach. , 2009, International journal of peptide and protein research.

[4]  Guoguang Lu,et al.  TOP: a new method for protein structure comparisons and similarity searches , 2000 .

[5]  Joël Pothier,et al.  YAKUSA: A fast structural database scanning method , 2005, Proteins.

[6]  J. M. Sauder,et al.  Large‐scale comparison of protein sequence alignment algorithms with structure alignments , 2000, Proteins.

[7]  M. Blades,et al.  Automatic generation and evaluation of sparse protein signatures for families of protein structural domains , 2005, Protein science : a publication of the Protein Society.

[8]  M. Zuker,et al.  The alignment of protein structures in three dimensions. , 1989, Bulletin of mathematical biology.

[9]  Carmay Lim,et al.  Effect of carboxylate-binding mode on metal binding/selectivity and function in proteins. , 2007, Accounts of chemical research.

[10]  Narayanaswamy Srinivasan,et al.  Protein Block Expert (PBE): a web-based protein structure analysis server using a structural alphabet , 2006, Nucleic Acids Res..

[11]  M J Sippl,et al.  Structure-based evaluation of sequence comparison and fold recognition alignment accuracy. , 2000, Journal of molecular biology.

[12]  H. Margalit,et al.  Evaluation of PSI‐BLAST alignment accuracy in comparison to structural alignments , 2000, Protein science : a publication of the Protein Society.

[13]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[14]  Jinn-Moon Yang,et al.  Protein structure database search and evolutionary classification , 2006, Nucleic acids research.

[15]  G Vriend,et al.  WHAT IF: a molecular modeling and drug design program. , 1990, Journal of molecular graphics.

[16]  J L Sussman,et al.  A 3D building blocks approach to analyzing and predicting structure of proteins , 1989, Proteins.

[17]  D. Stuart,et al.  A method for the systematic comparison of the three‐dimensional structures of proteins and some results , 1984 .

[18]  N. Srinivasan,et al.  A substitution matrix for structural alphabet based on structural alignment of homologous proteins and its applications , 2006, Proteins.

[19]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[20]  Douglas L. Brutlag,et al.  Hierarchical Protein Structure Superposition Using Both Secondary Structure and Atomic Representations , 1997, ISMB.

[21]  C. Etchebest,et al.  Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks , 2000, Proteins.

[22]  J Schuchhardt,et al.  Local structural motifs of protein backbones are classified by self-organizing neural networks. , 1996, Protein engineering.

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[24]  Ricardo Núñez Miguel,et al.  Sequence patterns derived from the automated prediction of functional residues in structurally-aligned homologous protein families , 2004, Bioinform..

[25]  M. Levitt,et al.  Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core , 1993, Current Biology.

[26]  T. A. Jones,et al.  Using known substructures in protein model building and crystallography. , 1986, The EMBO journal.

[27]  G. Barton,et al.  Multiple protein sequence alignment from tertiary structure comparison: Assignment of global and residue confidence levels , 1992, Proteins.

[28]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[29]  H. Wolfson,et al.  Flexible protein alignment and hinge detection , 2002, Proteins.

[30]  C. Sander,et al.  Protein structure comparison by alignment of distance matrices. , 1993, Journal of molecular biology.

[31]  P E Bourne,et al.  An alternative view of protein fold space , 2000, Proteins.

[32]  M. Perutz,et al.  Structure of hemoglobin. , 1960, Brookhaven symposia in biology.

[33]  VINCENT ESCALIER,et al.  Pairwide and Multiple Identification of Three-Dimensional Common Substructures in Proteins , 1998, J. Comput. Biol..

[34]  S. Balaji,et al.  PALI - a database of Phylogeny and ALIgnment of homologous protein structures , 2001, Nucleic Acids Res..

[35]  Alexandre G. de Brevern,et al.  New assessment of a structural alphabet , 2005, Silico Biol..

[36]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[37]  S. Balaji,et al.  Integration of related sequences with protein three-dimensional structural families in an updated version of PALI database , 2003, Nucleic Acids Res..

[38]  Frances M. G. Pearl,et al.  Quantifying the similarities within fold space. , 2002, Journal of molecular biology.

[39]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[40]  K Henrick,et al.  Electronic Reprint Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions Biological Crystallography Secondary-structure Matching (ssm), a New Tool for Fast Protein Structure Alignment in Three Dimensions , 2022 .

[41]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[42]  Adam Godzik,et al.  Flexible structure alignment by chaining aligned fragment pairs allowing twists , 2003, ECCB.

[43]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[44]  Rachel Kolodny,et al.  Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. , 2005, Journal of molecular biology.

[45]  Ron Unger,et al.  The importance of short structural motifs in protein structure analysis , 1993, J. Comput. Aided Mol. Des..

[46]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[47]  Gerard J Kleywegt,et al.  Evaluation of protein fold comparison servers , 2003, Proteins.

[48]  M. Levitt Accurate modeling of protein conformation by automatic segment matching. , 1992, Journal of molecular biology.