A general method for predicting amino acid residues experiencing hydrogen exchange

Information on protein hydrogen exchange can help delineate key regions involved in protein-protein inter-actions and provides important insight towards determining functional roles of genetic variants and their possible mechanisms in disease processes. Previous studies have shown that the degree of hydrogen exchange is affected by hydrogen bond formations, solvent accessibility, proximity to other residues, and experimental conditions. However, a general predictive method for identifying residues capable of hydrogen exchange transferable to a broad set of proteins is lacking. We have developed a machine learning method based on random forest that can predict whether a residue experiences hydrogen exchange. Using data from the Start2Fold database, which contains information on 13,306 residues (3,790 of which experience hydrogen exchange and 9,516 which do not exchange), our method achieves good performance. Specifically, we achieve an overall out-of-bag (OOB) error, an unbiased estimate of the test set error, of 20.3 percent. Using a randomly selected test data set consisting of 500 residues experiencing hydrogen exchange and 500 which do not, our method achieves an accuracy of 0.79, a recall of 0.74, a precision of 0.82, and an F1 score of 0.78.

[1]  J. Mandell,et al.  Measurement of amide hydrogen exchange by MALDI-TOF mass spectrometry. , 1998, Analytical chemistry.

[2]  Austin G. Meyer,et al.  Maximum Allowed Solvent Accessibilites of Residues in Proteins , 2012, PloS one.

[3]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[4]  Ernst-Walter Knapp,et al.  Stability and fluctuations of amide hydrogen bonds in a bacterial cytochrome c: a molecular dynamics study , 2005, JBIC Journal of Biological Inorganic Chemistry.

[5]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[6]  F. B. Howard,et al.  Premelting and the hydrogen‐exchange open state in synthetic RNA duplexes , 1984, Biopolymers.

[7]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[8]  N. Kallenbach,et al.  Hydrogen exchange and structural dynamics of proteins and nucleic acids , 1983, Quarterly Reviews of Biophysics.

[9]  Peter Tompa,et al.  Start2Fold: a database of hydrogen/deuterium exchange data on protein folding and stability , 2015, Nucleic Acids Res..

[10]  T. Sosnick,et al.  Hydrogen exchange: The modern legacy of Linderstrøm‐Lang , 1997, Protein science : a publication of the Protein Society.

[11]  Jan H. Jensen,et al.  PROPKA3: Consistent Treatment of Internal and Surface Residues in Empirical pKa Predictions. , 2011, Journal of chemical theory and computation.

[12]  Víctor Urrea,et al.  Letter to the Editor: Stability of Random Forest importance measures , 2011, Briefings Bioinform..

[13]  Michele Vendruscolo,et al.  Rare fluctuations of native proteins sampled by equilibrium hydrogen exchange. , 2003, Journal of the American Chemical Society.

[14]  J. Richardson,et al.  Asparagine and glutamine: using hydrogen atom contacts in the choice of side-chain amide orientation. , 1999, Journal of molecular biology.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Daniele Raimondi,et al.  Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins , 2017, Scientific Reports.

[17]  Elizabeth A Komives,et al.  Hydrogen-exchange mass spectrometry for the study of intrinsic disorder in proteins. , 2013, Biochimica et biophysica acta.

[18]  John R. Engen,et al.  Applications of Hydrogen/Deuterium Exchange MS from 2012 to 2014 , 2014, Analytical chemistry.

[19]  Jie Liang,et al.  CASTp: Computed Atlas of Surface Topography of proteins , 2003, Nucleic Acids Res..