A random graph approach to NMR sequential assignment.

Nuclear magnetic resonance (NMR) spectroscopy allows scientists to study protein structure, dynamics and interactions in solution. A necessary first step for such applications is determining the resonance assignment, mapping spectral data to atoms and residues in the primary sequence. Automated resonance assignment algorithms rely on information regarding connectivity (e.g., through-bond atomic interactions) and amino acid type, typically using the former to determine strings of connected residues and the latter to map those strings to positions in the primary sequence. Significant ambiguity exists in both connectivity and amino acid type information. This paper focuses on the information content available in connectivity alone and develops a novel random-graph theoretic framework and algorithm for connectivity-driven NMR sequential assignment. Our random graph model captures the structure of chemical shift degeneracy, a key source of connectivity ambiguity. We then give a simple and natural randomized algorithm for finding optimal assignments as sets of connected fragments in NMR graphs. The algorithm naturally and efficiently reuses substrings while exploring connectivity choices; it overcomes local ambiguity by enforcing global consistency of all choices. By analyzing our algorithm under our random graph model, we show that it can provably tolerate relatively large ambiguity while still giving expected optimal performance in polynomial time. We present results from practical applications of the algorithm to experimental datasets from a variety of proteins and experimental set-ups. We demonstrate that our approach is able to overcome significant noise and local ambiguity in identifying significant fragments of sequential assignments.

[1]  S. Talukdar,et al.  Automated probabilistic method for assigning backbone resonances of (13C,15N)-labeled proteins , 1997, Journal of biomolecular NMR.

[2]  Bruce Randall Donald,et al.  An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments , 2004, Journal of biomolecular NMR.

[3]  H N Moseley,et al.  Automated analysis of NMR assignments and structures for proteins. , 1999, Current opinion in structural biology.

[4]  H. Atreya,et al.  A tracked approach for automated NMR assignments in proteins (TATAPRO) , 2000, Journal of biomolecular NMR.

[5]  W. M. Westler,et al.  A relational database for sequence-specific protein NMR data , 1991, Journal of biomolecular NMR.

[6]  Kurt Wüthrich,et al.  Sequence-specific NMR assignment of proteins by global fragment mapping with the program Mapper , 2000, Journal of biomolecular NMR.

[7]  Kurt Wüthrich,et al.  GARANT‐a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra , 1997 .

[8]  R A Goldstein,et al.  Protein heteronuclear NMR assignments using mean-field simulated annealing. , 1997, Journal of magnetic resonance.

[9]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[10]  Horst Kessler,et al.  Automated backbone assignment of labeled proteins using the threshold accepting algorithm , 1998, Journal of biomolecular NMR.

[11]  J. Lukin,et al.  MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins , 2003, Journal of biomolecular NMR.

[12]  Kuo-Bin Li,et al.  Automated Resonance Assignment of Proteins Using Heteronuclear 3D NMR, 1. Backbone Spin Systems Extraction and Creation of Polypeptides , 1997, J. Chem. Inf. Comput. Sci..

[13]  D. Wishart,et al.  Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts , 2003, Journal of Biomolecular NMR.

[14]  Ján Plesník,et al.  The NP-Completeness of the Hamiltonian Cycle Problem in Planar Digraphs with Degree Bound Two , 1979, Inf. Process. Lett..

[15]  G. Montelione,et al.  Automated analysis of protein NMR assignments using methods from artificial intelligence. , 1997, Journal of molecular biology.

[16]  Alan M. Frieze,et al.  Finding Hidden Hamiltonian Cycles , 1994, Random Struct. Algorithms.

[17]  A J Wand,et al.  Two-dimensional 1H NMR study of human ubiquitin: a main chain directed assignment and structure analysis. , 1987, Biochemistry.

[18]  Zhi-Zhong Chen,et al.  An efficient branch-and-bound algorithm for the assignment of protein backbone NMR peaks , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[19]  Chris Bailey-Kellogg,et al.  The NOESY Jigsaw: Automated Protein Secondary Structure and Main-Chain Assignment from Sparse, Unassigned NMR Data , 2000, J. Comput. Biol..

[20]  Gopal Pandurangan,et al.  On a simple randomized algorithm for finding a 2-factor in sparse graphs , 2005, Inf. Process. Lett..

[21]  R. Stevens,et al.  Global Efforts in Structural Genomics , 2001, Science.