Structure-Based Chemical Shift Prediction Using Random Forests Non-Linear Regression

Protein nuclear magnetic resonance (NMR) chemical shifts are among the most accurately measurable spectroscopic parameters and are closely correlated to protein structure because of their dependence on the local electronic environment. The precise nature of this correlation remains largely unknown. Accurate prediction of chemical shifts from existing structures’ atomic co-ordinates will permit close study of this relationship. This paper presents a novel non- linear regression based approach to chemical shift prediction from protein structure. The regression model employed combines quantum, classical and empirical variables and provides statistically signifi cant improved prediction accuracy over existing chemical shift predictors, across protein backbone atom types. The results presented here were obtained using the Random Forest regression algorithm on a protein entry data set derived from the RefDB re-referenced chemical shift database.

[1]  C. Dominguez,et al.  HADDOCK: a protein-protein docking approach based on biochemical or biophysical information. , 2003, Journal of the American Chemical Society.

[2]  Rafael Brüschweiler,et al.  Assignment strategy for proteins with known structure. , 2002, Journal of magnetic resonance.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[5]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[6]  D. Case,et al.  Automated prediction of 15N, 13Cα, 13Cβ and 13C′ chemical shifts in proteins using a density functional database , 2001, Journal of biomolecular NMR.

[7]  V. Krishnan,et al.  An empirical correlation between secondary structure content and averaged chemical shifts in proteins. , 2003, Biophysical journal.

[8]  C. Langmead High-Throughput 3D Homology Detection via NMR Resonance Assignment , 2004 .

[9]  Miron Livny,et al.  BioMagResBank , 2007, Nucleic Acids Res..

[10]  Michael Sattler,et al.  Automated evaluation of chemical shift perturbation spectra: New approaches to quantitative analysis of receptor-ligand interaction NMR spectra , 2004, Journal of biomolecular NMR.

[11]  David S Wishart,et al.  RefDB: A database of uniformly referenced protein chemical shifts , 2003, Journal of biomolecular NMR.

[12]  N. O. Manning,et al.  The protein data bank , 1999, Genetica.

[13]  V. V. Krishnan,et al.  Protein structural class identification directly from NMR spectra using averaged chemical shifts , 2003, Bioinform..

[14]  T. Hamelryck An amino acid has two sides: A new 2D measure provides a different view of solvent exposure , 2005, Proteins.

[15]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[16]  D. Wishart,et al.  The 13C Chemical-Shift Index: A simple method for the identification of protein secondary structure using 13C chemical-shift data , 1994, Journal of biomolecular NMR.

[17]  A. Bax,et al.  Protein backbone angle restraints from searching a database for chemical shift and sequence homology , 1999, Journal of biomolecular NMR.

[18]  J. Meiler PROSHIFT: Protein chemical shift prediction using artificial neural networks , 2003, Journal of biomolecular NMR.

[19]  Bruce Randall Donald,et al.  3D structural homology detection via NMR resonance assignment , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[20]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[21]  D. Wishart,et al.  Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts , 2003, Journal of Biomolecular NMR.