The NOESY jigsaw: automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data

High-throughput, data-directed computational protocols for Structural Genomics (or Proteomics) are required in order to evaluate the protein products of genes for structure and function at rates comparable to current gene-sequencing technology. This paper presents the JIGSAW algorithm, a novel high-throughput, automated approach to protein structure characterization with nuclear magnetic resonance (NMR). JIGSAW applies graph algorithms and probabilistic reasoning techniques, enforcing first-principles consistency rules in order to overcome a 5-10% signal-to-noise ratio. It consists of two main components: (1) graph-based secondary structure pattern identification in unassigned heteronuclear NMR data, and (2) assignment of spectral peaks by probabilistic alignment of identified secondary structure elements against the primary sequence. JIGSAW's deferment of assignment until after secondary structure identification differs greatly from traditional approaches, which begin by correlating peaks among dozens of experiments. By deferring assignment, JIGSAW not only eliminates this bottleneck, it also allows the number of experiments to be reduced from dozens to four, none of which requires 13 C-labeled protein. This in turn dramatically reduces the amount and expense of wet lab molecular biology for protein expression and purification, as well as the total spectrometer time to collect data. Our results for three test proteins demonstrate that we are able to identify and align approximately 80 percent of α-helical and 60 percent of β-sheet structure. JIGSAW is very fast, running in minutes on a Pentium-class Linux workstation. This approach yields quick and reasonably accurate (as opposed to the traditional slow and extremely accurate) structure calculations, utilizing a suite of graph analysis algorithms to compensate for the data sparseness. JIGSAW could be used for quick structural assays to speed data to the biologist early in the process of investigation, and could in principle be applied in an automation-like fashion to a large fraction of the proteome.

[1]  Kurt Wüthrich,et al.  GARANT-a general algorithm for resonance assignment of multidimensional nuclear magnetic resonance spectra , 1997, J. Comput. Chem..

[2]  Ad Bax,et al.  Four-Dimensional 15N-Separated NOESY of Slowly Tumbling Perdeuterated 15N-Enriched Proteins. Application to HIV-1 Nef , 1995 .

[3]  S W Englander,et al.  Main-chain-directed strategy for the assignment of 1H NMR spectra of proteins. , 1987, Biochemistry.

[4]  D. Pearlman,et al.  Automated detection of problem restraints in NMR data sets using the FINGAR genetic algorithm method , 1999, Journal of biomolecular NMR.

[5]  Bennett T. Farmer,et al.  Use of 1HN-1HN NOEs to Determine Protein Global Folds in Perdeuterated Proteins , 1995 .

[6]  B. Donald,et al.  Reducing mass degeneracy in SAR by MS by stable isotopic labeling. , 2000, Journal of computational biology : a journal of computational molecular cell biology.

[7]  W. Braun,et al.  Automated assignment of simulated and experimental NOESY spectra of proteins by feedback filtering and self-correcting distance geometry. , 1995, Journal of molecular biology.

[8]  A. Torda,et al.  Enhanced protein fold recognition using secondary structure information from nmr , 1999, Protein science : a publication of the Protein Society.

[9]  Guang Zhu,et al.  2D and 3D TROSY-enhanced NOESY of 15N labeled proteins , 1999 .

[10]  Temple F. Smith,et al.  Global optimum protein threading with gapped alignment and empirical pair score functions. , 1996, Journal of molecular biology.

[11]  P. Hajduk,et al.  Discovering High-Affinity Ligands for Proteins , 1997, Science.

[12]  Hartmut Oschkinat,et al.  Tools for the automated assignment of high-resolution three-dimensional protein NMR spectra based on pattern recognition techniques , 1997, Journal of biomolecular NMR.

[13]  M. Billeter,et al.  Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. , 1998, Journal of magnetic resonance.

[14]  K Wüthrich,et al.  The program XEASY for computer-supported NMR spectral analysis of biological macromolecules , 1995, Journal of biomolecular NMR.

[15]  Steven Skiena,et al.  Identifying gene regulatory networks from experimental data , 2001, Parallel Comput..

[16]  A J Wand,et al.  Two-dimensional 1H NMR study of human ubiquitin: a main chain directed assignment and structure analysis. , 1987, Biochemistry.

[17]  D. M. Schneider,et al.  Implementation of the main chain directed assignment strategy. Computer assisted approach. , 1991, Biophysical journal.

[18]  G. Montelione,et al.  Automated analysis of protein NMR assignments using methods from artificial intelligence. , 1997, Journal of molecular biology.

[19]  Ad Bax,et al.  A powerful method of sequential proton resonance assignment in proteins using relayed 15N‐1H multiple quantum coherence spectroscopy , 1989, FEBS letters.

[20]  Horst Kessler,et al.  Automated backbone assignment of labeled proteins using the threshold accepting algorithm , 1998, Journal of biomolecular NMR.

[21]  Primo Pristovek,et al.  Semiautomatic sequence‐specific assignment of proteins based on the tertiary structure—The program st2nmr , 2002, J. Comput. Chem..

[22]  S. Talukdar,et al.  Automated probabilistic method for assigning backbone resonances of (13C,15N)-labeled proteins , 1997, Journal of biomolecular NMR.

[23]  W. M. Westler,et al.  A relational database for sequence-specific protein NMR data , 1991, Journal of biomolecular NMR.

[24]  K. Wüthrich NMR of proteins and nucleic acids , 1988 .

[25]  Anthony K. Felts,et al.  A branch and bound algorithm for protein structure refinement from sparse NMR data sets. , 1999, Journal of molecular biology.

[26]  Ka Yee Yeung,et al.  Algorithms for choosing differential gene expression experiments , 1999, RECOMB.

[27]  Gerhard Wagner,et al.  Application of automated NOE assignment to three-dimensional structure refinement of a 28 kDa single-chain T cell receptor , 1999, Journal of biomolecular NMR.

[28]  G. Deléage,et al.  A computerized version of the Chou and Fasman method for predicting the secondary structure of proteins. , 1987, Analytical biochemistry.

[29]  J H Bushweller,et al.  1H, 13C, and 15N NMR Resonance Assignments of Vaccinia Glutaredoxin-1 in the Fully Reduced form , 1998, Journal of biomolecular NMR.

[30]  Toàn Phan Huy,et al.  A Branch-and-Bound Algorithm , 2000 .

[31]  P. Hajduk,et al.  Discovering High-Affinity Ligands for Proteins: SAR by NMR , 1996, Science.

[32]  F. Richards,et al.  The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. , 1992, Biochemistry.

[33]  Eaton E. Lattman Ph.D Editor-in-Chief Third meeting on the critical assessment of techniques for protein structure prediction , 1999 .

[34]  A. Palmer,et al.  Probing molecular motion by NMR. , 1997, Current opinion in structural biology.

[35]  Ying Xu,et al.  Protein structure determination using protein threading and sparse NMR data (extended abstract) , 1999, RECOMB '00.

[36]  G. Zhu,et al.  Gradient and sensitivity enhancement of 2D TROSY with water flip-back, 3D NOESY-TROSY and TOCSY-TROSY experiments , 1999, Journal of biomolecular NMR.

[37]  Ulrich Dorndorf,et al.  A Branch-and-Bound Algorithm , 2002 .

[38]  Werner Braun,et al.  Automated combined assignment of NOESY spectra and three-dimensional protein structure determination , 1997, Journal of biomolecular NMR.

[39]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[40]  Kurt Wüthrich,et al.  Processing of multi-dimensional NMR data with the new software PROSA , 1992 .

[41]  A. Galat,et al.  A note on circular-dichroic-constrained prediction of protein secondary structure. , 1996, European journal of biochemistry.

[42]  Arthur G. Palmer,et al.  Nuclear Magnetic Resonance Studies of Biopolymer Dynamics , 1996 .

[43]  M H Saier,et al.  Mapping of the binding interfaces of the proteins of the bacterial phosphotransferase system, HPr and IIAglc. , 1993, Biochemistry.

[44]  J H Bushweller,et al.  Complete heteronuclear NMR resonance assignments and secondary structure of core binding factor β (1-141) , 1998, Journal of biomolecular NMR.

[45]  Lewis E. Kay,et al.  Protein dynamics from NMR , 1998, Nature Structural Biology.

[46]  Ron Shamir,et al.  An algorithm for clustering cDNAs for gene expression analysis , 1999, RECOMB.

[47]  Peter Norvig,et al.  A modern, agent-oriented approach to introductory artificial intelligence , 1995, SGAR.

[48]  J. Cavanagh Protein NMR Spectroscopy: Principles and Practice , 1995 .

[49]  L. Kay,et al.  Global folds of highly deuterated, methyl-protonated proteins by multidimensional NMR. , 1997, Biochemistry.

[50]  R. Klevit,et al.  Multidimensional nuclear magnetic resonance spectroscopy of DNA-binding proteins. , 1991, Methods in enzymology.

[51]  Chaohong Sun,et al.  Complete 1H, 13C, and 15N NMR resonance assignments and secondary structure of human glutaredoxin in the fully reduced form , 1997, Protein science : a publication of the Protein Society.

[52]  Gerhard Wagner,et al.  A SINGLE-CHAIN T CELL RECEPTOR , 1999 .