An efficient randomized algorithm for contact-based NMR backbone resonance assignment

MOTIVATION Backbone resonance assignment is a critical bottleneck in studies of protein structure, dynamics and interactions by nuclear magnetic resonance (NMR) spectroscopy. A minimalist approach to assignment, which we call 'contact-based', seeks to dramatically reduce experimental time and expense by replacing the standard suite of through-bond experiments with the through-space (nuclear Overhauser enhancement spectroscopy, NOESY) experiment. In the contact-based approach, spectral data are represented in a graph with vertices for putative residues (of unknown relation to the primary sequence) and edges for hypothesized NOESY interactions, such that observed spectral peaks could be explained if the residues were 'close enough'. Due to experimental ambiguity, several incorrect edges can be hypothesized for each spectral peak. An assignment is derived by identifying consistent patterns of edges (e.g. for alpha-helices and beta-sheets) within a graph and by mapping the vertices to the primary sequence. The key algorithmic challenge is to be able to uncover these patterns even when they are obscured by significant noise. RESULTS This paper develops, analyzes and applies a novel algorithm for the identification of polytopes representing consistent patterns of edges in a corrupted NOESY graph. Our randomized algorithm aggregates simplices into polytopes and fixes inconsistencies with simple local modifications, called rotations, that maintain most of the structure already uncovered. In characterizing the effects of experimental noise, we employ an NMR-specific random graph model in proving that our algorithm gives optimal performance in expected polynomial time, even when the input graph is significantly corrupted. We confirm this analysis in simulation studies with graphs corrupted by up to 500% noise. Finally, we demonstrate the practical application of the algorithm on several experimental beta-sheet datasets. Our approach is able to eliminate a large majority of noise edges and to uncover large consistent sets of interactions. AVAILABILITY Our algorithm has been implemented in the platform-independent Python code. The software can be freely obtained for academic use by request from the authors.

[1]  Leslie G. Valiant,et al.  Fast probabilistic algorithms for hamiltonian circuits and matchings , 1977, STOC '77.

[2]  R. Stevens,et al.  Global Efforts in Structural Genomics , 2001, Science.

[3]  Alexander Grishaev,et al.  CLOUDS, a protocol for deriving a molecular proton density via NMR , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Chris Bailey-Kellogg,et al.  Model-Based Assignment and Inference of Protein Backbone Nuclear Magnetic Resonances , 2004, Statistical applications in genetics and molecular biology.

[5]  Bruce Randall Donald,et al.  An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments , 2004, Journal of biomolecular NMR.

[6]  H N Moseley,et al.  Automated analysis of NMR assignments and structures for proteins. , 1999, Current opinion in structural biology.

[7]  W. M. Westler,et al.  A relational database for sequence-specific protein NMR data , 1991, Journal of biomolecular NMR.

[8]  Chris Bailey-Kellogg,et al.  The NOESY Jigsaw: Automated Protein Secondary Structure and Main-Chain Assignment from Sparse, Unassigned NMR Data , 2000, J. Comput. Biol..

[9]  A J Wand,et al.  Two-dimensional 1H NMR study of human ubiquitin: a main chain directed assignment and structure analysis. , 1987, Biochemistry.

[10]  Zhi-Zhong Chen,et al.  An efficient branch-and-bound algorithm for the assignment of protein backbone NMR peaks , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[11]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[12]  Gopal Pandurangan,et al.  On a simple randomized algorithm for finding a 2-factor in sparse graphs , 2005, Inf. Process. Lett..

[13]  Kurt Wüthrich,et al.  Sequence-specific NMR assignment of proteins by global fragment mapping with the program Mapper , 2000, Journal of biomolecular NMR.

[14]  Thérèse E Malliavin,et al.  From NMR chemical shifts to amino acid types: Investigation of the predictive power carried by nuclei , 2004, Journal of biomolecular NMR.

[15]  D. M. Schneider,et al.  Implementation of the main chain directed assignment strategy. Computer assisted approach. , 1991, Biophysical journal.

[16]  G. Montelione,et al.  Automated analysis of protein NMR assignments using methods from artificial intelligence. , 1997, Journal of molecular biology.

[17]  Thomas Szyperski,et al.  Protein NMR spectroscopy in structural genomics , 2000, Nature Structural Biology.

[18]  Chris Bailey-Kellogg,et al.  Reconsidering Complete Search Algorithms for Protein , 2005 .

[19]  G. Grimmett,et al.  Probability and random processes , 2002 .

[20]  Chris Bailey-Kellogg,et al.  A random graph approach to NMR sequential assignment. , 2005 .

[21]  Gordon S. Rule,et al.  Rapid Protein Structure Detection and Assignment using Residual Dipolar Couplings , 2002 .

[22]  Chris Bailey-Kellogg,et al.  The NOESY jigsaw: automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data , 2000, RECOMB '00.

[23]  J. Pons,et al.  RESCUE: An artificial neural network tool for the NMR spectral assignment of proteins , 1999, Journal of biomolecular NMR.

[24]  R. A. Doney,et al.  4. Probability and Random Processes , 1993 .