Reconsidering complete search algorithms for protein backbone NMR assignment

MOTIVATION Nuclear magnetic resonance (NMR) spectroscopy is widely used to determine and analyze protein structures. An essential step in NMR studies is determining the backbone resonance assignment, which maps individual atoms to experimentally measured resonance frequencies. Performing assignment is challenging owing to the noise and ambiguity in NMR spectra. Although automated procedures have been investigated, by-and-large they are still struggling to gain acceptance because of inherent limits in scalability and/or unacceptable levels of assignment error. To have confidence in the results, an algorithm should be complete, i.e. able to identify all solutions consistent with the data, including all arbitrary configurations of extra and missing peaks. The ensuing combinatorial explosion in the space of possible assignments has led to the perception that complete search is hopelessly inefficient and cannot scale to realistic datasets. RESULTS This paper presents a complete branch-contract-and-bound search algorithm for backbone resonance assignment. The algorithm controls the search space by hierarchically agglomerating partial assignments and employing statistically sound pruning criteria. It considers all solutions consistent with the data, and uniformly treats all combinations of extra and missing data. We demonstrate our approach on experimental data from five proteins ranging in size from 70 to 154 residues. The algorithm assigns >95% of the positions with >98% accuracy. We also present results on simulated data from 259 proteins from the RefDB database, ranging in size from 25 to 257 residues. The median computation time for these cases is 1 min, and the assignment accuracy is >99%. These results demonstrate that complete search not only has the advantage of guaranteeing fair treatment of all feasible solutions, but is efficient enough to be employed effectively inpractice. AVAILABILITY The MBA(2) software package is made available under an open-source software license. The datasets featured in the Results section can also be obtained from the contact author.

[1]  J. Lukin,et al.  MONTE: An automated Monte Carlo based approach to nuclear magnetic resonance assignment of proteins , 2003, Journal of biomolecular NMR.

[2]  H. Atreya,et al.  A tracked approach for automated NMR assignments in proteins (TATAPRO) , 2000, Journal of biomolecular NMR.

[3]  Zhi-Zhong Chen,et al.  Approximation algorithms for NMR spectral peak assignment , 2003, Theor. Comput. Sci..

[4]  Brian E Coggins,et al.  PACES: Protein sequential assignment by computer-assisted exhaustive search , 2003, Journal of biomolecular NMR.

[5]  G. Montelione,et al.  Automated analysis of protein NMR assignments using methods from artificial intelligence. , 1997, Journal of molecular biology.

[6]  Kurt Wüthrich,et al.  Sequence-specific NMR assignment of proteins by global fragment mapping with the program Mapper , 2000, Journal of biomolecular NMR.

[7]  Thérèse E Malliavin,et al.  From NMR chemical shifts to amino acid types: Investigation of the predictive power carried by nuclei , 2004, Journal of biomolecular NMR.

[8]  Zhi-Zhong Chen,et al.  An efficient branch-and-bound algorithm for the assignment of protein backbone NMR peaks , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[9]  Robert F. Boyko,et al.  CAMRA: Chemical shift based computer aided protein NMR assignments , 1998, Journal of biomolecular NMR.

[10]  H N Moseley,et al.  Automated analysis of NMR assignments and structures for proteins. , 1999, Current opinion in structural biology.

[11]  R. Levy,et al.  Protein structural motif recognition via NMR residual dipolar couplings. , 2001, Journal of the American Chemical Society.

[12]  Thomas Szyperski,et al.  Protein NMR spectroscopy in structural genomics , 2000, Nature Structural Biology.

[13]  Chris Bailey-Kellogg,et al.  Model-Based Assignment and Inference of Protein Backbone Nuclear Magnetic Resonances , 2004, Statistical applications in genetics and molecular biology.

[14]  Chris Bailey-Kellogg,et al.  A random graph approach to NMR sequential assignment , 2004, J. Comput. Biol..

[15]  David S Wishart,et al.  RefDB: A database of uniformly referenced protein chemical shifts , 2003, Journal of biomolecular NMR.

[16]  W. M. Westler,et al.  A relational database for sequence-specific protein NMR data , 1991, Journal of biomolecular NMR.

[17]  M. Zweckstetter,et al.  Mars - robust automatic backbone assignment of proteins , 2004, Journal of biomolecular NMR.