Objective identification of residue ranges for the superposition of protein structures

BackgroundThe automation of objectively selecting amino acid residue ranges for structure superpositions is important for meaningful and consistent protein structure analyses. So far there is no widely-used standard for choosing these residue ranges for experimentally determined protein structures, where the manual selection of residue ranges or the use of suboptimal criteria remain commonplace.ResultsWe present an automated and objective method for finding amino acid residue ranges for the superposition and analysis of protein structures, in particular for structure bundles resulting from NMR structure calculations. The method is implemented in an algorithm, CYRANGE, that yields, without protein-specific parameter adjustment, appropriate residue ranges in most commonly occurring situations, including low-precision structure bundles, multi-domain proteins, symmetric multimers, and protein complexes. Residue ranges are chosen to comprise as many residues of a protein domain that increasing their number would lead to a steep rise in the RMSD value. Residue ranges are determined by first clustering residues into domains based on the distance variance matrix, and then refining for each domain the initial choice of residues by excluding residues one by one until the relative decrease of the RMSD value becomes insignificant. A penalty for the opening of gaps favours contiguous residue ranges in order to obtain a result that is as simple as possible, but not simpler. Results are given for a set of 37 proteins and compared with those of commonly used protein structure validation packages. We also provide residue ranges for 6351 NMR structures in the Protein Data Bank.ConclusionsThe CYRANGE method is capable of automatically determining residue ranges for the superposition of protein structure bundles for a large variety of protein structures. The method correctly identifies ordered regions. Global structure superpositions based on the CYRANGE residue ranges allow a clear presentation of the structure, and unnecessary small gaps within the selected ranges are absent. In the majority of cases, the residue ranges from CYRANGE contain fewer gaps and cover considerably larger parts of the sequence than those from other methods without significantly increasing the RMSD values. CYRANGE thus provides an objective and automatic method for standardizing the choice of residue ranges for the superposition of protein structures.

[1]  Kazuo Shinozaki,et al.  Solution structure of the rhodanese homology domain At4g01050(175–295) from Arabidopsis thaliana , 2005, Protein science : a publication of the Protein Society.

[2]  R Diamond,et al.  Coordinate-based cluster analysis. , 1995, Acta crystallographica. Section D, Biological crystallography.

[3]  Peter Güntert,et al.  Structural investigation of the C-terminal catalytic fragment of presenilin 1 , 2010, Proceedings of the National Academy of Sciences.

[4]  Torsten Herrmann,et al.  NMR Structure and Metal Interactions of the CopZ Copper Chaperone* , 1999, The Journal of Biological Chemistry.

[5]  M. DePristo,et al.  Is one solution good enough? , 2006, Nature Structural &Molecular Biology.

[6]  R. Schulz,et al.  Protein Structure Prediction , 2020, Methods in Molecular Biology.

[7]  T. Schneider,et al.  Objective comparison of protein structures: error-scaled difference distance matrices. , 2000, Acta crystallographica. Section D, Biological crystallography.

[8]  C. Sander,et al.  Errors in protein structures , 1996, Nature.

[9]  Sumio Sugano,et al.  Solution structure of the Src homology 2 domain from␣the human feline sarcoma oncogene Fes , 2005, Journal of biomolecular NMR.

[10]  Peter Güntert,et al.  Automated NMR protein structure calculation , 2003 .

[11]  Torsten Herrmann,et al.  Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. , 2002, Journal of molecular biology.

[12]  Peter Güntert,et al.  Solution structures of the first and fourth TSR domains of F‐spondin , 2006, Proteins.

[13]  P Luginbühl,et al.  NMR structure reveals intramolecular regulation mechanism for pheromone binding and release , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Peter Güntert,et al.  Optimal isotope labelling for NMR protein structure determinations , 2006, Nature.

[15]  J. Thornton,et al.  AQUA and PROCHECK-NMR: Programs for checking the quality of protein structures solved by NMR , 1996, Journal of biomolecular NMR.

[16]  K. Wüthrich,et al.  Torsion angle dynamics for NMR structure calculation with the new program DYANA. , 1997, Journal of molecular biology.

[17]  T. Schneider,et al.  Domain identification by iterative analysis of error-scaled difference distance matrices. , 2004, Acta crystallographica. Section D, Biological crystallography.

[18]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[19]  Jack Snoeyink,et al.  Nucleic Acids Research Advance Access published April 22, 2007 MolProbity: all-atom contacts and structure validation for proteins and nucleic acids , 2007 .

[20]  Gaetano T Montelione,et al.  Evaluating protein structures determined by structural genomics consortia , 2006, Proteins.

[21]  A. D. McLachlan,et al.  A mathematical procedure for superimposing atomic coordinates of proteins , 1972 .

[22]  A. Mclachlan Gene duplications in the structural evolution of chymotrypsin. , 1979, Journal of molecular biology.

[23]  Peter Güntert,et al.  Solution structure of an atypical WW domain in a novel β‐clam‐like dimeric form , 2007 .

[24]  Sumio Sugano,et al.  Letter to the Editor: NMR assignment of the SH2 domain from the human feline sarcoma oncogene FES , 2004, Journal of biomolecular NMR.

[25]  Gert Vriend,et al.  Validation of protein structures derived by NMR spectroscopy , 2004 .

[26]  Gaetano T Montelione,et al.  Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles , 2005, Proteins.

[27]  Peter Güntert,et al.  Automated protein structure determination from NMR spectra. , 2006, Journal of the American Chemical Society.

[28]  Michael Nilges,et al.  A simple method for delineating well‐defined and variable regions in protein structures determined from interproton distance data , 1987 .

[29]  T F Havel,et al.  The solution structure of eglin c based on measurements of many NOEs and coupling constants and its comparison with X‐ray structures , 1992, Protein science : a publication of the Protein Society.

[30]  Kurt Wüthrich,et al.  Ancestral βγ-crystallin precursor structure in a yeast killer toxin , 1996, Nature Structural Biology.

[31]  Michael Habeck,et al.  Robust probabilistic superposition and comparison of protein structures , 2010, BMC Bioinformatics.

[32]  L. Kelley,et al.  An automated approach for clustering an ensemble of NMR-derived protein structures into conformationally related subfamilies. , 1996, Protein engineering.

[33]  L. Kelley,et al.  An automated approach for defining core atoms and domains in an ensemble of NMR-derived protein structures. , 1997, Protein engineering.

[34]  Adam Zemla,et al.  LGA: a method for finding 3D similarities in protein structures , 2003, Nucleic Acids Res..

[35]  M. Billeter,et al.  MOLMOL: a program for display and analysis of macromolecular structures. , 1996, Journal of molecular graphics.

[36]  Kazuo Shinozaki,et al.  Letter to the Editor: NMR assignment of the hypothetical rhodanese domain At4g01050 from Arabidopsis thaliana , 2004, Journal of biomolecular NMR.

[37]  T. Schneider A genetic algorithm for the identification of conformationally invariant regions in protein molecules. , 2002, Acta crystallographica. Section D, Biological crystallography.

[38]  Kurt Wüthrich,et al.  Prion protein NMR structures of chickens, turtles, and frogs. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Kazuo Shinozaki,et al.  Letter to the Editor: NMR assignment of the hypothetical ENTH-VHS domain At3g16270 from Arabidopsis thaliana , 2004, Journal of biomolecular NMR.