Accurate Prediction of Chemical Shifts for Aqueous Protein Structure for "Real World" Cases using Machine Learning

Accurate prediction of NMR chemical shifts can in principle help refine aqueous solution structure of proteins to the quality of X-ray structures. We report a new machine learning algorithm for protein chemical shift prediction that outperforms existing chemical shift calculators on realistic NMR solution data. Our UCBShift predictor implements two modules: a transfer prediction module that employs both sequence and structural alignment to select reference candidates for experimental chemical shift replication, and a redesigned machine learning module based on random forest regression which utilizes more, and more carefully curated, feature extracted data. When combined together, this new predictor achieves state of the art accuracy for predicting chemical shifts on a "real-world" dataset, with root-mean-square errors of 0.31 ppm for amide hydrogens, 0.19 ppm for Halpha, 0.87 ppm for C, 0.81 ppm for Calpha, 1.01 ppm for Cbeta, and 1.83 ppm for N, exceeding current prediction accuracy of popular chemical shift predictors such as SPARTA+ and SHIFTX2.

[1]  D. Wishart,et al.  Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts , 2003, Journal of Biomolecular NMR.

[2]  G. Schulz,et al.  Adenylate kinase motions during catalysis: an energetic counterweight balancing substrate binding. , 1996, Structure.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  Simon W. Ginzinger,et al.  SHIFTX2: significantly improved protein chemical shift prediction , 2011, Journal of biomolecular NMR.

[5]  R. Mallion,et al.  Ring current theories in nuclear magnetic resonance , 1979 .

[6]  Rafael Brüschweiler,et al.  Contact model for the prediction of NMR N-H order parameters in globular proteins. , 2002, Journal of the American Chemical Society.

[7]  M. Williamson,et al.  Secondary‐structure dependent chemical shifts in proteins , 1990, Biopolymers.

[8]  A. Bax,et al.  Protein backbone chemical shifts predicted from searching a database for torsion angle and sequence homology , 2007, Journal of biomolecular NMR.

[9]  A. D. Buckingham,et al.  CHEMICAL SHIFTS IN THE NUCLEAR MAGNETIC RESONANCE SPECTRA OF MOLECULES CONTAINING POLAR GROUPS , 1960 .

[10]  A. Bax,et al.  Empirical correlation between protein backbone conformation and C.alpha. and C.beta. 13C nuclear magnetic resonance chemical shifts , 1991 .

[11]  Mitsuo Iwadate,et al.  Cα and Cβ Carbon-13 Chemical Shifts in Proteins From an Empirical Database , 1999 .

[12]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[13]  A. Bax,et al.  SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network , 2010, Journal of biomolecular NMR.

[14]  A. Zagari,et al.  Water molecules as structural determinants among prions of low sequence identity , 2006, FEBS letters.

[15]  D. Case Chemical shifts in biomolecules. , 2013, Current opinion in structural biology.

[16]  D. Baker,et al.  Coupled prediction of protein secondary and tertiary structure , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  D. Wishart,et al.  Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts , 2003, Journal of biomolecular NMR.

[18]  F. Richards,et al.  Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. , 1991, Journal of molecular biology.

[19]  David A Case,et al.  Molecular dynamics and NMR spin relaxation in proteins. , 2002, Accounts of chemical research.

[20]  D. Case Calibration of ring-current effects in proteins and nucleic acids , 1995, Journal of biomolecular NMR.

[21]  Wolfgang Rieping,et al.  Bmc Structural Biology Relationship between Chemical Shift Value and Accessible Surface Area for All Amino Acid Atoms , 2009 .

[22]  L W Jelinski,et al.  Nuclear magnetic resonance spectroscopy. , 1995, Academic radiology.

[23]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[24]  Miron Livny,et al.  BioMagResBank , 2007, Nucleic Acids Res..

[25]  A. Pardi,et al.  Hydrogen bond length and proton NMR chemical shifts in proteins , 1983 .

[26]  Robert F. Boyko,et al.  Automated 1H and 13C chemical shift prediction using the BioMagResBank , 1997, Journal of biomolecular NMR.

[27]  Rafael Brüschweiler,et al.  Contact model for the prediction of NMR N-H order parameters in globular proteins. , 2002, Journal of the American Chemical Society.

[28]  Kai J. Kohlhoff,et al.  Fast and accurate predictions of protein NMR chemical shifts from interatomic distances. , 2009, Journal of the American Chemical Society.

[29]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[30]  Alexander Hexemer,et al.  A Multi-Resolution 3D-DenseNet for Chemical Shift Prediction in NMR Crystallography. , 2019, The journal of physical chemistry letters.

[31]  T. Gibson,et al.  Solution structure of the DNA-binding domain of the yeast transcriptional activator protein GCN4. , 1990, Protein engineering.

[32]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Yang Zhang,et al.  mTM‐align: an algorithm for fast and accurate multiple protein structure alignment , 2018, Bioinform..

[35]  M. Nakasako Water-protein interactions from high-resolution protein crystallography. , 2004, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[36]  D. Wishart,et al.  Protein chemical shift analysis: a practical guide. , 1998, Biochemistry and cell biology = Biochimie et biologie cellulaire.

[37]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[38]  R. Hodges,et al.  1H, 13C and 15N random coil NMR chemical shifts of the common amino acids. I. Investigations of nearest-neighbor effects , 1995, Journal of biomolecular NMR.

[39]  David S Wishart,et al.  RefDB: A database of uniformly referenced protein chemical shifts , 2003, Journal of biomolecular NMR.

[40]  G. Schulz,et al.  Crystal structures of two mutants of adenylate kinase from Escherichia coli that modify the Gly‐loop , 1993, Proteins.

[41]  David S Wishart,et al.  A simple method to predict protein flexibility using secondary chemical shifts. , 2005, Journal of the American Chemical Society.

[42]  Stephen H. White,et al.  Experimentally determined hydrophobicity scale for proteins at membrane interfaces , 1996, Nature Structural Biology.

[43]  B D Sykes,et al.  1H, 13C and 15N random coil NMR chemical shifts of the common amino acids. I. Investigations of nearest-neighbor effects , 1995, Journal of biomolecular NMR.

[44]  David S Wishart,et al.  A probabilistic approach for validating protein NMR chemical shift assignments , 2010, Journal of biomolecular NMR.

[45]  H. Dyson,et al.  Unfolded proteins and protein folding studied by NMR. , 2004, Chemical reviews.

[46]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[47]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .