Accurate and automated classification of protein secondary structure with PsiCSI

PsiCSI is a highly accurate and automated method of assigning secondary structure from NMR data, which is a useful intermediate step in the determination of tertiary structures. The method combines information from chemical shifts and protein sequence using three layers of neural networks. Training and testing was performed on a suite of 92 proteins (9437 residues) with known secondary and tertiary structure. Using a stringent cross‐validation procedure in which the target and homologous proteins were removed from the databases used for training the neural networks, an average 89% Q3 accuracy (per residue) was observed. This is an increase of 6.2% and 5.5% (representing 36% and 33% fewer errors) over methods that use chemical shifts (CSI) or sequence information (Psipred) alone. In addition, PsiCSI improves upon the translation of chemical shift information to secondary structure (Q3 = 87.4%) and is able to use sequence information as an effective substitute for sparse NMR data (Q3 = 86.9% without 13C shifts and Q3 = 86.8% with only Hα shifts available). Finally, errors made by PsiCSI almost exclusively involve the interchange of helix or strand with coil and not helix with strand (<2.5 occurrences per 10000 residues). The automation, increased accuracy, absence of gross errors, and robustness with regards to sparse data make PsiCSI ideal for high‐throughput applications, and should improve the effectiveness of hybrid NMR/de novo structure determination methods. A Web server is available for users to submit data and have the assignment returned.

[1]  Gaetano T. Montelione,et al.  Structural genomics: An approach to the protein folding problem , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Ram Samudrala,et al.  Ab initio protein structure prediction using a combined hierarchical approach , 1999, Proteins.

[3]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[4]  G J Barton,et al.  Evaluation and improvement of multiple sequence methods for protein secondary structure prediction , 1999, Proteins.

[5]  M. Levitt,et al.  A comprehensive analysis of 40 blind protein structure predictions , 2002, BMC Structural Biology.

[6]  Stephen K. Burley,et al.  An overview of structural genomics , 2000, Nature Structural Biology.

[7]  J. Skolnick,et al.  Ab initio folding of proteins using restraints derived from evolutionary information , 1999, Proteins.

[8]  KharHengChoo,et al.  Recent Applications of Hidden Markov Models in Computational Biology , 2004 .

[9]  B. Rost PHD: predicting one-dimensional protein structure by profile-based neural networks. , 1996, Methods in enzymology.

[10]  S. Brenner A tour of structural genomics , 2001, Nature Reviews Genetics.

[11]  R Kaptein,et al.  Rapid protein fold determination using secondary chemical shifts and cross-hydrogen bond 15N-13C′ scalar couplings (3hbJNC′) , 2001, Journal of biomolecular NMR.

[12]  P. Argos,et al.  Knowledge‐based protein secondary structure assignment , 1995, Proteins.

[13]  Marc A. Martí-Renom,et al.  EVA: continuous automatic evaluation of protein structure prediction servers , 2001, Bioinform..

[14]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[15]  H N Moseley,et al.  Automatic determination of protein backbone resonance assignments from triple resonance nuclear magnetic resonance data. , 2001, Methods in enzymology.

[16]  Richard Bonneau,et al.  Rosetta in CASP4: Progress in ab initio protein structure prediction , 2001, Proteins.

[17]  D. Wishart,et al.  The 13C Chemical-Shift Index: A simple method for the identification of protein secondary structure using 13C chemical-shift data , 1994, Journal of biomolecular NMR.

[18]  F. Richards,et al.  The chemical shift index: a fast and simple method for the assignment of protein secondary structure through NMR spectroscopy. , 1992, Biochemistry.

[19]  W. M. Westler,et al.  A relational database for sequence-specific protein NMR data , 1991, Journal of biomolecular NMR.

[20]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[21]  D. Baker,et al.  De novo determination of protein backbone structure from residual dipolar couplings using Rosetta. , 2002, Journal of the American Chemical Society.

[22]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[23]  A. Bax,et al.  Protein backbone angle restraints from searching a database for chemical shift and sequence homology , 1999, Journal of biomolecular NMR.

[24]  Volker A. Eyrich,et al.  EVA: Large‐scale analysis of secondary structure prediction , 2001, Proteins.

[25]  F. Richards,et al.  Identification of structural motifs from protein coordinate data: Secondary structure and first‐level supersecondary structure * , 1988, Proteins.

[26]  P. Y. Chou,et al.  Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. , 1974, Biochemistry.

[27]  A. Bax,et al.  Empirical correlation between protein backbone conformation and C.alpha. and C.beta. 13C nuclear magnetic resonance chemical shifts , 1991 .

[28]  F. Richards,et al.  Relationship between nuclear magnetic resonance chemical shift and protein secondary structure. , 1991, Journal of molecular biology.

[29]  Chris Bailey-Kellogg,et al.  The NOESY jigsaw: automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data , 2000, RECOMB '00.

[30]  Ad Bax,et al.  Protein Structure Determination Using Molecular Fragment Replacement and NMR Dipolar Couplings , 2000 .

[31]  D. Haussler,et al.  Hidden Markov models in computational biology. Applications to protein modeling. , 1993, Journal of molecular biology.

[32]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[33]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .