A random graph approach to NMR sequential assignment

Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics, and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated resonance assignment algorithms rely on information regarding connectivity (e.g. through-bond atomic interactions) and amino acid type, typically using the former to determine strings of connected residues and the latter to map those strings to positions in the primary sequence. Significant ambiguity exists in both connectivity and amino acid type, and different algorithms have combined the information in two phases (find short unambiguous strings then align) or simultaneously (align while extending strings). This paper focuses on the information content available in connectivity alone, allowing for ambiguity rather than handling only unambiguous strings, and complements existing work on the information content in amino acid type.In this paper, we develop a novel random-graph theoretic framework for algorithmic analysis of NMR sequential assignment. Our random graph model captures the structure of chemical shift degeneracy (a key source of connectivity ambiguity). We then give a simple and natural randomized algorithm for finding an optimum sequential cover. The algorithm naturally and efficiently reuses substrings while exploring connectivity choices; it overcomes local ambiguity by enforcing global consistency of all choices. We employ our random graph model to analyze our algorithm, and show that it can provably tolerate a relatively large ambiguity while still giving expected optimal performance in polynomial time. To study the algorithm's performance in practice, we tested it on experimental data sets from a variety of proteins and experimental set-ups. The algorithm was able to overcome significant noise and local ambiguity and consistently identify significant sequential fragments.

[1]  Kurt Wüthrich,et al.  GARANT-a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra , 1997, J. Comput. Chem..

[2]  Gopal Pandurangan,et al.  On a simple randomized algorithm for finding a 2-factor in sparse graphs , 2005, Inf. Process. Lett..

[3]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[4]  K.‐B. Li,et al.  Automated Resonance Assignment of Proteins Using Heteronuclear 3D NMR. Part 1. Backbone Spin Systems Extraction and Creation of Polypeptides , 1997 .

[5]  H. Atreya,et al.  A tracked approach for automated NMR assignments in proteins (TATAPRO) , 2000, Journal of biomolecular NMR.

[6]  G. Grimmett,et al.  Probability and random processes , 2002 .

[7]  Thomas Szyperski,et al.  Protein NMR spectroscopy in structural genomics , 2000, Nature Structural Biology.

[8]  Chris Bailey-Kellogg,et al.  The NOESY jigsaw: automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data , 2000, RECOMB '00.

[9]  R A Goldstein,et al.  Protein heteronuclear NMR assignments using mean-field simulated annealing. , 1997, Journal of magnetic resonance.

[10]  Horst Kessler,et al.  Automated backbone assignment of labeled proteins using the threshold accepting algorithm , 1998, Journal of biomolecular NMR.

[11]  R. Stevens,et al.  Global Efforts in Structural Genomics , 2001, Science.

[12]  Alan M. Frieze,et al.  Finding Hidden Hamiltonian Cycles , 1994, Random Struct. Algorithms.

[13]  Chris Bailey-Kellogg,et al.  Model-Based Assignment and Inference of Protein Backbone Nuclear Magnetic Resonances , 2004, Statistical applications in genetics and molecular biology.

[14]  Mark J Howard,et al.  Protein NMR spectroscopy , 1998, Current Biology.

[15]  Kurt Wüthrich,et al.  Sequence-specific NMR assignment of proteins by global fragment mapping with the program Mapper , 2000, Journal of biomolecular NMR.

[16]  Chris Bailey-Kellogg,et al.  The NOESY Jigsaw: Automated Protein Secondary Structure and Main-Chain Assignment from Sparse, Unassigned NMR Data , 2000, J. Comput. Biol..

[17]  Gordon S. Rule,et al.  Rapid Protein Structure Detection and Assignment using Residual Dipolar Couplings , 2002 .

[18]  A. Marcus Proteins and nucleic acids , 1981 .

[19]  Noga Alon,et al.  The Probabilistic Method , 2015, Fundamentals of Ramsey Theory.

[20]  S. Talukdar,et al.  Automated probabilistic method for assigning backbone resonances of (13C,15N)-labeled proteins , 1997, Journal of biomolecular NMR.

[21]  W. M. Westler,et al.  A relational database for sequence-specific protein NMR data , 1991, Journal of biomolecular NMR.

[22]  J. Lukin,et al.  MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins , 2003, Journal of biomolecular NMR.

[23]  R. A. Doney,et al.  4. Probability and Random Processes , 1993 .

[24]  A J Wand,et al.  Two-dimensional 1H NMR study of human ubiquitin: a main chain directed assignment and structure analysis. , 1987, Biochemistry.

[25]  Zhi-Zhong Chen,et al.  An efficient branch-and-bound algorithm for the assignment of protein backbone NMR peaks , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[26]  Bruce Randall Donald,et al.  An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments , 2004, Journal of biomolecular NMR.

[27]  H N Moseley,et al.  Automated analysis of NMR assignments and structures for proteins. , 1999, Current opinion in structural biology.

[28]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[29]  D. Wishart,et al.  Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts , 2003, Journal of biomolecular NMR.

[30]  Kuo-Bin Li,et al.  Automated Resonance Assignment of Proteins Using Heteronuclear 3D NMR, 2. Side Chain and Sequence-Specific Assignment , 1997, J. Chem. Inf. Comput. Sci..

[31]  Gopal Pandurangan On a Simple Randomized Algorithm for Finding Long Cycles in Sparse Graphs , 2004 .

[32]  Kurt Wüthrich,et al.  GARANT‐a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra , 1997 .

[33]  Béla Bollobás,et al.  Random Graphs , 1985 .

[34]  Kuo-Bin Li,et al.  Automated Resonance Assignment of Proteins Using Heteronuclear 3D NMR, 1. Backbone Spin Systems Extraction and Creation of Polypeptides , 1997, J. Chem. Inf. Comput. Sci..

[35]  Ján Plesník,et al.  The NP-Completeness of the Hamiltonian Cycle Problem in Planar Digraphs with Degree Bound Two , 1979, Inf. Process. Lett..

[36]  K. Wüthrich NMR of proteins and nucleic acids , 1988 .

[37]  D. Wishart,et al.  Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts , 2003, Journal of Biomolecular NMR.

[38]  G. Montelione,et al.  Automated analysis of protein NMR assignments using methods from artificial intelligence. , 1997, Journal of molecular biology.