An Evolutionary Conservation & Rigidity Analysis Machine Learning Approach for Detecting Critical Protein Residues

In proteins, certain amino acids may play a critical role in determining their structure and function. Examples include flexible regions which allow domain motions, and highly conserved residues on functional interfaces which play a role in binding and interaction with other proteins. Detecting these regions facilitates the analysis and simulation of protein rigidity and conformational changes, and aids in characterizing protein-protein binding. We present a machine-learning based method for the analysis and prediction of critical residues in proteins. We combine amino-acid specific information and data obtained by two complementary methods. One method, KINARI-Mutagen, performs graph-based analysis to find rigid clusters of amino acids in a protein, and the other method uses evolutionary conservation scores to find functional interfaces in proteins. We devised a machine learning model that combines both methods, in addition to amino acid type and solvent accessible surface area, to a dataset of proteins with experimentally known critical residues, and were able to achieve over 77% prediction rate, more than either of the methods separately.

[1]  T. Pollard,et al.  Annual review of biophysics and biomolecular structure , 1992 .

[2]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[3]  Arlo Z. Randall,et al.  Prediction of protein stability changes for single‐site mutations using support vector machines , 2005, Proteins.

[4]  B. Matthews,et al.  The response of T4 lysozyme to large‐to‐small substitutions within the core and its relation to the hydrophobic effect , 1998, Protein science : a publication of the Protein Society.

[5]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[6]  B. Hendrickson,et al.  Regular ArticleAn Algorithm for Two-Dimensional Rigidity Percolation: The Pebble Game , 1997 .

[7]  Jacobs,et al.  Generic rigidity percolation: The pebble game. , 1995, Physical review letters.

[8]  Angela D. Wilkins,et al.  Evolutionary trace for prediction and redesign of protein functional sites. , 2012, Methods in molecular biology.

[9]  T L Blundell,et al.  Prediction of the stability of protein mutants based on structural environment-dependent amino acid substitution and propensity tables. , 1997, Protein engineering.

[10]  Audrey Lee-St. John,et al.  Pebble game algorithms and sparse graphs , 2007, Discret. Math..

[11]  D Gilis,et al.  Predicting protein stability changes upon mutation using database-derived potentials: solvent accessibility determines the importance of local versus non-local interactions along the sequence. , 1997, Journal of molecular biology.

[12]  A J Olson,et al.  Structural symmetry and protein function. , 2000, Annual review of biophysics and biomolecular structure.

[13]  Ileana Streinu,et al.  Using rigidity analysis to probe mutation-induced structural changes in proteins , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[14]  D. Jacobs,et al.  Protein flexibility predictions using graph theory , 2001, Proteins.

[15]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[16]  L. Serrano,et al.  Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. , 2002, Journal of molecular biology.

[17]  Amarda Shehu,et al.  An Evolutionary conservation-Based Method for Refining and Reranking protein Complex Structures , 2012, J. Bioinform. Comput. Biol..

[18]  B. Hendrickson,et al.  An Algorithm for Two-Dimensional Rigidity Percolation , 1997 .

[19]  Yang Li,et al.  KINARI-Web: a server for protein rigidity analysis , 2011, Nucleic Acids Res..

[20]  C. Pabo,et al.  The DNA-binding domain of p53 contains the four conserved regions and the major mutation hot spots. , 1993, Genes & development.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Akinori Sarai,et al.  ProTherm and ProNIT: thermodynamic databases for proteins and protein–nucleic acid interactions , 2005, Nucleic Acids Res..