A Markov Random Field Framework for Protein Side-Chain Resonance Assignment

Nuclear magnetic resonance (NMR) spectroscopy plays a critical role in structural genomics, and serves as a primary tool for determining protein structures, dynamics and interactions in physiologically-relevant solution conditions The current speed of protein structure determination via NMR is limited by the lengthy time required in resonance assignment, which maps spectral peaks to specific atoms and residues in the primary sequence Although numerous algorithms have been developed to address the backbone resonance assignment problem [68,2,10,37,14,64,1,31,60], little work has been done to automate side-chain resonance assignment [43, 48, 5] Most previous attempts in assigning side-chain resonances depend on a set of NMR experiments that record through-bond interactions with side-chain protons for each residue Unfortunately, these NMR experiments have low sensitivity and limited performance on large proteins, which makes it difficult to obtain enough side-chain resonance assignments On the other hand, it is essential to obtain almost all of the side-chain resonance assignments as a prerequisite for high-resolution structure determination To overcome this deficiency, we present a novel side-chain resonance assignment algorithm based on alternative NMR experiments measuring through-space interactions between protons in the protein, which also provide crucial distance restraints and are normally required in high-resolution structure determination We cast the side-chain resonance assignment problem into a Markov Random Field (MRF) framework, and extend and apply combinatorial protein design algorithms to compute the optimal solution that best interprets the NMR data Our MRF framework captures the contact map information of the protein derived from NMR spectra, and exploits the structural information available from the backbone conformations determined by orientational restraints and a set of discretized side-chain conformations (i.e., rotamers) A Hausdorff-based computation is employed in the scoring function to evaluate the probability of side-chain resonance assignments to generate the observed NMR spectra The complexity of the assignment problem is first reduced by using a dead-end elimination (DEE) algorithm, which prunes side-chain resonance assignments that are provably not part of the optimal solution Then an A* search algorithm is used to find a set of optimal side-chain resonance assignments that best fit the NMR data We have tested our algorithm on NMR data for five proteins, including the FF Domain 2 of human transcription elongation factor CA150 (FF2), the B1 domain of Protein G (GB1), human ubiquitin, the ubiquitin-binding zinc finger domain of the human Y-family DNA polymerase Eta (pol η UBZ), and the human Set2-Rpb1 interacting domain (hSRI) Our algorithm assigns resonances for more than 90% of the protons in the proteins, and achieves about 80% correct side-chain resonance assignments The final structures computed using distance restraints resulting from the set of assigned side-chain resonances have backbone RMSD 0.5−1.4 A and all-heavy-atom RMSD 1.0−2.2 A from the reference structures that were determined by X-ray crystallography or traditional NMR approaches These results demonstrate that our algorithm can be successfully applied to automate side-chain resonance assignment and high-quality protein structure determination Since our algorithm does not require any specific NMR experiments for measuring the through-bond interactions with side-chain protons, it can save a significant amount of both experimental cost and spectrometer time, and hence accelerate the NMR structure determination process.

[1]  Werner Braun,et al.  Automated combined assignment of NOESY spectra and three-dimensional protein structure determination , 1997, Journal of biomolecular NMR.

[2]  J H Prestegard,et al.  Rapid determination of protein folds using residual dipolar couplings. , 2000, Journal of molecular biology.

[3]  Bruce Randall Donald,et al.  Exact Solutions for Internuclear Vectors and Backbone Dihedral Angles from NH Residual Dipolar Couplings in Two Media, and their Application in a Systematic Search Algorithm for Determining Protein Backbone Structure , 2004, Journal of biomolecular NMR.

[4]  Jun Hyoung Lee,et al.  Phenotypic engineering by reprogramming gene transcription using novel artificial transcription factors in Escherichia coli , 2008, Nucleic acids research.

[5]  S. Li,et al.  Structure of the ubiquitin‐binding zinc finger domain of human DNA Y‐polymerase η , 2007, EMBO reports.

[6]  J. Prestegard,et al.  Residual Dipolar Couplings in Structure Determination of Biomolecules , 2004 .

[7]  Johan Desmet,et al.  The dead-end elimination theorem and its use in protein side-chain positioning , 1992, Nature.

[8]  Brian E Coggins,et al.  PACES: Protein sequential assignment by computer-assisted exhaustive search , 2003, Journal of biomolecular NMR.

[9]  Bruce Randall Donald,et al.  An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments , 2004, Journal of biomolecular NMR.

[10]  H N Moseley,et al.  Automated analysis of NMR assignments and structures for proteins. , 1999, Current opinion in structural biology.

[11]  Kuo-Bin Li,et al.  Automated Extracting of Amino Acid Spin Systems in Proteins Using 3D HCCH-COSY/TOCSY Spectroscopy and Constrained Partitioning Algorithm (CPA) , 1996, J. Chem. Inf. Comput. Sci..

[12]  Chris Bailey-Kellogg,et al.  An efficient randomized algorithm for contact-based NMR backbone resonance assignment , 2006, Bioinform..

[13]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[14]  P. Güntert Automated NMR structure calculation with CYANA. , 2004, Methods in molecular biology.

[15]  Anthony K. Yan,et al.  Large a polynomial-time nuclear vector replacement algorithm for automated NMR resonance assignments , 2003, RECOMB '03.

[16]  Eric P. Xing,et al.  Free Energy Estimates of All-Atom Protein Structures Using Generalized Belief Propagation , 2007, RECOMB.

[17]  Charles D Schwieters,et al.  Completely automated, highly error-tolerant macromolecular structure determination from multidimensional nuclear overhauser enhancement spectra and chemical shift assignments. , 2004, Journal of the American Chemical Society.

[18]  G. Wagner,et al.  Efficient side-chain and backbone assignment in large proteins: Application to tGCN5 , 1999, Journal of biomolecular NMR.

[19]  Wen-Lian Hsu,et al.  RIBRA-An Error-Tolerant Algorithm for the NMR Backbone Assignment Problem , 2005, RECOMB.

[20]  J H Prestegard,et al.  Nuclear magnetic dipole interactions in field-oriented proteins: information for structure determination in solution. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[22]  Rochus Keller,et al.  SideLink: automated side-chain assignment of biopolymers from NMR data by relative-hypothesis-prioritization-based simulated logic. , 2006, Journal of magnetic resonance.

[23]  Thérèse E Malliavin,et al.  From NMR chemical shifts to amino acid types: Investigation of the predictive power carried by nuclei , 2004, Journal of biomolecular NMR.

[24]  A. Bax,et al.  Measurement of J and dipolar couplings from simplified two-dimensional NMR spectra. , 1998, Journal of magnetic resonance.

[25]  Chris Bailey-Kellogg,et al.  Inferential backbone assignment for sparse data , 2006, Journal of biomolecular NMR.

[26]  G. Montelione,et al.  Automated analysis of protein NMR assignments using methods from artificial intelligence. , 1997, Journal of molecular biology.

[27]  Miron Livny,et al.  BioMagResBank , 2007, Nucleic Acids Res..

[28]  Gaetano T Montelione,et al.  Automated analysis of protein NMR assignments and structures. , 2004, Chemical reviews.

[29]  Torsten Herrmann,et al.  Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. , 2002, Journal of molecular biology.

[30]  A. Bax,et al.  Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. , 1997, Science.

[31]  A R Leach,et al.  Exploring the conformational space of protein side chains using dead‐end elimination and the A* algorithm , 1998, Proteins.

[32]  Michael Nilges,et al.  Materials and Methods Som Text Figs. S1 to S6 References Movies S1 to S5 Inferential Structure Determination , 2022 .

[33]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[34]  Bruce Randall Donald,et al.  A Polynomial-Time Algorithm for de novo Protein Backbone Structure Determination from NMR Data , 2006 .

[35]  Charles D Schwieters,et al.  The Xplor-NIH NMR molecular structure determination package. , 2003, Journal of magnetic resonance.

[36]  Amy C. Anderson,et al.  Computational structure-based redesign of enzyme activity , 2009, Proceedings of the National Academy of Sciences.

[37]  Francesco Fiorito,et al.  Automated Resonance Assignment of Proteins: 6 DAPSY-NMR , 2006, Journal of biomolecular NMR.

[38]  Francesco Fiorito,et al.  Automated amino acid side-chain NMR assignment of proteins using 13C- and 15N-resolved 3D [1H,1H]-NOESY , 2008, Journal of biomolecular NMR.

[39]  Bruce Randall Donald,et al.  High-resolution protein structure determination starting with a global fold calculated from exact solutions to the RDC equations , 2009, Journal of biomolecular NMR.

[40]  G. Ball,et al.  Measurement of one-bond 13Cα–1Hα residual dipolar coupling constants in proteins by selective manipulation of CαHα spins , 2006 .

[41]  Robert Powers,et al.  A topology‐constrained distance network algorithm for protein structure determination from NOESY data , 2005, Proteins.

[42]  Alexander Grishaev,et al.  Protein structure elucidation from NMR proton densities , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[43]  B. Donald,et al.  Symbolic and Numerical Computation for Artificial Intelligence , 1997 .

[44]  Peter Güntert,et al.  Automated NMR protein structure calculation , 2003 .

[45]  Daniel P. Huttenlocher,et al.  Computing visual correspondence: incorporating the probability of a false match , 1995, Proceedings of IEEE International Conference on Computer Vision.

[46]  Chris Bailey-Kellogg,et al.  A random graph approach to NMR sequential assignment. , 2005 .

[47]  Ad Bax,et al.  Validation of Protein Structure from Anisotropic Carbonyl Chemical Shifts in a Dilute Liquid Crystalline Phase , 1998 .

[48]  Bruce R Donald,et al.  Automated NMR Assignment and Protein Structure Determination using Sparse Dipolar Coupling Constraints. , 2009, Progress in nuclear magnetic resonance spectroscopy.

[49]  Ke Ruan,et al.  De novo determination of internuclear vector orientations from residual dipolar couplings measured in three independent alignment media , 2008, Journal of biomolecular NMR.

[50]  Alexander Grishaev,et al.  CLOUDS, a protocol for deriving a molecular proton density via NMR , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Sebastian Hiller,et al.  Automated NMR assignment of protein side chain resonances using automated projection spectroscopy (APSY). , 2008, Journal of the American Chemical Society.

[52]  Olga Veksler,et al.  Markov random fields with efficient approximations , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[53]  Daniel P. Huttenlocher,et al.  Comparing Images Using the Hausdorff Distance , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  G. Marius Clore,et al.  Improving the Packing and Accuracy of NMR Structures with a Pseudopotential for the Radius of Gyration , 1999 .

[55]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[56]  Michael Nilges,et al.  ARIA: automated NOE assignment and NMR structure calculation , 2003, Bioinform..

[57]  Anthony K. Yan,et al.  A Polynomial-Time Nuclear Vector Replacement Algorithm for Automated NMR Resonance Assignments , 2004, J. Comput. Biol..

[58]  Chris Bailey-Kellogg,et al.  The NOESY Jigsaw: Automated Protein Secondary Structure and Main-Chain Assignment from Sparse, Unassigned NMR Data , 2000, J. Comput. Biol..

[59]  Bruce Randall Donald,et al.  3D structural homology detection via unassigned residual dipolar couplings , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[60]  L L Looger,et al.  Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. , 2001, Journal of molecular biology.

[61]  Bruce Randall Donald,et al.  A Polynomial-Time Algorithm for De Novo Protein Backbone Structure Determination from Nuclear Magnetic Resonance Data , 2006, J. Comput. Biol..

[62]  R. Goldstein Efficient rotamer elimination applied to protein side-chains and related spin glasses. , 1994, Biophysical journal.

[63]  Bruce Randall Donald,et al.  A Novel Minimized Dead-End Elimination Criterion and Its Application to Protein Redesign in a Hybrid Scoring and Search Algorithm for Computing Partition Functions over Molecular Ensembles , 2006, RECOMB.

[64]  Bruce Randall Donald,et al.  High-Throughput 3D Structural Homology Detection via NMR Resonance Assignment , 2004 .

[65]  Changhe Yuan,et al.  Dynamic Weighting A* Search-based MAP Algorithm for Bayesian Networks , 2006, Probabilistic Graphical Models.

[66]  Ying Xu,et al.  An Efficient Computational Method for Globally Optimal Threading , 1998, J. Comput. Biol..

[67]  Kuo-Bin Li,et al.  Automated Resonance Assignment of Proteins Using Heteronuclear 3D NMR, 2. Side Chain and Sequence-Specific Assignment , 1997, J. Chem. Inf. Comput. Sci..

[68]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[69]  Bruce Randall Donald,et al.  3D structural homology detection via NMR resonance assignment , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[70]  Arash Bahrami,et al.  Probabilistic Identification of Spin Systems and their Assignments including Coil–Helix Inference as Output (PISTACHIO) , 2005, Journal of biomolecular NMR.

[71]  H. Phatnani,et al.  Solution structure of the Set2-Rpb1 interacting domain of human Set2 and its interaction with the hyperphosphorylated C-terminal domain of Rpb1. , 2005, Proceedings of the National Academy of Sciences of the United States of America.