Conserved key amino acid positions (CKAAPs) derived from the analysis of common substructures in proteins

An all‐against‐all protein structure comparison using the Combinatorial Extension (CE) algorithm applied to a representative set of PDB structures revealed a gallery of common substructures in proteins (http://cl.sdsc.edu/ce.html). These substructures represent commonly identified folds, domains, or components thereof. Most of the subsequences forming these similar substructures have no significant sequence similarity. We present a method to identify conserved amino acid positions and residue‐dependent property clusters within these subsequences starting with structure alignments. Each of the subsequences is aligned to its homologues in SWALL, a nonredundant protein sequence database. The most similar sequences are purged into a common frequency matrix, and weighted homologues of each one of the subsequences are used in scoring for conserved key amino acid positions (CKAAPs). We have set the top 20% of the high‐scoring positions in each substructure to be CKAAPs. It is hypothesized that CKAAPs may be responsible for the common folding patterns in either a local or global view of the protein‐folding pathway. Where a significant number of structures exist, CKAAPs have also been identified in structure alignments of complete polypeptide chains from the same protein family or superfamily. Evidence to support the presence of CKAAPs comes from other computational approaches and experimental studies of mutation and protein‐folding experiments, notably the Paracelsus challenge. Finally, the structural environment of CKAAPs versus non‐CKAAPs is examined for solvent accessibility, hydrogen bonding, and secondary structure. The identification of CKAAPs has important implications for protein engineering, fold recognition, modeling, and structure prediction studies and is dependent on the availability of structures and an accurate structure alignment methodology. Proteins 2001;42:148–163. © 2000 Wiley‐Liss, Inc.

[1]  Philip E. Bourne,et al.  CKAAPs DB: a Conserved Key Amino Acid Positions DataBase , 2002, Nucleic Acids Res..

[2]  Philip E. Bourne,et al.  A New Algorithm for the Alignment of Multiple Protein Structures Using Monte Carlo Optimization , 2000, Pacific Symposium on Biocomputing.

[3]  P E Bourne,et al.  An alternative view of protein fold space , 2000, Proteins.

[4]  T L Blundell,et al.  Analysis and prediction of inter-strand packing distances between beta-sheets of globular proteins. , 1999, Protein engineering.

[5]  E. Cota,et al.  Folding studies of immunoglobulin-like beta-sandwich proteins suggest that they share a common folding pathway. , 1999, Structure.

[6]  L. Mirny,et al.  Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. , 1999, Journal of molecular biology.

[7]  W. Pearson,et al.  Evolution of protein sequences and structures. , 1999, Journal of molecular biology.

[8]  O. Ptitsyn,et al.  Non-functional conserved residues in globins and their possible role as a folding nucleus. , 1999, Journal of molecular biology.

[9]  A. Poupon,et al.  The immunoglobulin fold family: sequence analysis and 3D structure comparisons. , 1999, Protein engineering.

[10]  P J Munson,et al.  Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs. , 1999, Protein engineering.

[11]  R. Sauer,et al.  Evolution of a protein fold in vitro. , 1999, Science.

[12]  E. Shakhnovich Folding by association , 1999, Nature Structural Biology.

[13]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[14]  Motonori Ota,et al.  The Protein Mutant Database , 1999, Nucleic Acids Res..

[15]  B V Reddy,et al.  Use of propensities of amino acids to the local structural environments to understand effect of substitution mutations on protein stability. , 1998, Protein engineering.

[16]  R. Jernigan,et al.  Identification of kinetically hot residues in proteins , 1998, Protein science : a publication of the Protein Society.

[17]  John P. Overington,et al.  HOMSTRAD: A database of protein structure alignments for homologous families , 1998, Protein science : a publication of the Protein Society.

[18]  A. Efimov A structural tree for proteins containing S‐like β‐sheets , 1998 .

[19]  P E Bourne,et al.  Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. , 1998, Protein engineering.

[20]  E. Shakhnovich,et al.  A strategy for detecting the conservation of folding-nucleus residues in protein superfamilies. , 1998, Folding & design.

[21]  A. Murzin How far divergent evolution goes in proteins. , 1998, Current opinion in structural biology.

[22]  O. Ptitsyn,et al.  Protein folding and protein evolution: common folding nucleus in different subfamilies of c-type cytochromes? , 1998, Journal of molecular biology.

[23]  L A Mirny,et al.  How evolution makes proteins fold quickly. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[24]  N. Clarke,et al.  A hybrid sequence approach to the paracelsus challenge , 1998, Proteins.

[25]  A. Efimov A structural tree for proteins containing S-like beta-sheets. , 1998, FEBS letters.

[26]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[27]  M. James,et al.  Structural details of a calcium-induced molecular switch: X-ray crystallographic analysis of the calcium-saturated N-terminal domain of troponin C at 1.75 A resolution. , 1997, Journal of molecular biology.

[28]  A. Fiser,et al.  Stabilization centers in proteins: identification, characterization and predictions. , 1997, Journal of molecular biology.

[29]  S. Balasubramanian,et al.  Transmuting α helices and β sheets , 1997 .

[30]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[31]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[32]  G. Rose Protein folding and the Paracelsus challenge , 1997, Nature Structural Biology.

[33]  Suganthi Balasubramanian,et al.  Protein alchemy: Changing β-sheet into α-helix , 1997, Nature Structural Biology.

[34]  A. Efimov Structural trees for protein superfamilies , 1997, Proteins.

[35]  B. Rost,et al.  Protein structures sustain evolutionary drift. , 1997, Folding & design.

[36]  S. Balasubramanian,et al.  Protein alchemy: changing beta-sheet into alpha-helix. , 1997, Nature structural biology.

[37]  S. Gagné,et al.  Mechanism of direct coupling between binding and induced structural change in regulatory calcium binding proteins. , 1997, Biochemistry.

[38]  T. Hubbard,et al.  Critical assessment of methods of protein structure prediction (CASP): Round III , 1999, Proteins.

[39]  R Sánchez,et al.  Evaluation of comparative protein structure modeling by MODELLER‐3 , 1997, Proteins.

[40]  S. Balasubramanian,et al.  Transmuting alpha helices and beta sheets. , 1997, Folding & design.

[41]  David T. Jones,et al.  Towards meeting the paracelsus challenge: The design, synthesis, and characterization of paracelsin‐43, an α‐helical protein with over 50% sequence identity to an all‐β protein , 1996 .

[42]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[43]  E. Shakhnovich,et al.  Conserved residues and the mechanism of protein folding , 1996, Nature.

[44]  C. Waldburger,et al.  Sequence determinants of folding and stability for the P22 Arc repressor dimer , 1996, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[45]  D. T. Jones,et al.  Towards meeting the Paracelsus Challenge: The design, synthesis, and characterization of paracelsin-43, an alpha-helical protein with over 50% sequence identity to an all-beta protein. , 1996, Proteins.

[46]  P. Pedersen,et al.  Defective protein folding as a basis of human disease. , 1995, Trends in biochemical sciences.

[47]  F. Reinach,et al.  The troponin complex and regulation of muscle contraction , 1995, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[48]  D. Shortle,et al.  The emerging role of insertions and deletions in protein engineering. , 1995, Current opinion in biotechnology.

[49]  G J Barton,et al.  Structural features can be unconserved in proteins with similar folds. An analysis of side-chain to side-chain contacts secondary structure and accessibility. , 1994, Journal of molecular biology.

[50]  P Bork,et al.  The immunoglobulin fold. Structural classification, sequence patterns and common core. , 1994, Journal of molecular biology.

[51]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[52]  T L Blundell,et al.  Packing of secondary structural elements in proteins. Analysis and prediction of inter-helix distances. , 1993, Journal of molecular biology.

[53]  G. Böhm,et al.  Structural relationships of homologous proteins as a fundamental principle in homology modeling , 1993, Proteins.

[54]  T L Blundell,et al.  An evaluation of the performance of an automated procedure for comparative modelling of protein tertiary structure. , 1993, Protein engineering.

[55]  G. Rose,et al.  Protein folding--what's the question? , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[56]  D. Shortle,et al.  Mutational studies of protein structures and their stabilities , 1992, Quarterly Reviews of Biophysics.

[57]  T. Blundell,et al.  Definition of general topological equivalence in protein structures. A procedure involving comparison of properties and relationships through simulated annealing and dynamic programming. , 1990, Journal of molecular biology.

[58]  W. Lim,et al.  Deciphering the message in protein sequences: tolerance to amino acid substitutions. , 1990, Science.

[59]  B. Matthews,et al.  Genetic and structural analysis of the protein stability problem. , 1987, Biochemistry.

[60]  J. Dice Molecular determinants of protein half‐lives in eukaryotic cells , 1987, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[61]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[62]  A M Lesk,et al.  The evolution of protein structures. , 1987, Cold Spring Harbor symposia on quantitative biology.

[63]  A. M. Lesk,et al.  The response of protein structures to amino-acid sequence changes , 1986, Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences.

[64]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[65]  W. Taylor,et al.  The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[66]  R. Ingraham,et al.  Binary interactions of troponin subunits. , 1984, The Journal of biological chemistry.

[67]  E. Baker,et al.  Hydrogen bonding in globular proteins. , 1984, Progress in biophysics and molecular biology.

[68]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[69]  R. Doolittle Similar amino acid sequences: chance or common ancestry? , 1981, Science.

[70]  J. Richardson,et al.  The anatomy and taxonomy of protein structure. , 1981, Advances in protein chemistry.

[71]  K Nishikawa,et al.  Prediction of the surface-interior diagram of globular proteins by an empirical method. , 2009, International journal of peptide and protein research.

[72]  R. Kretsinger,et al.  Carp muscle calcium-binding protein. II. Structure determination and general description. , 1973, The Journal of biological chemistry.

[73]  B. Lee,et al.  The interpretation of protein structures: estimation of static accessibility. , 1971, Journal of molecular biology.