Native secondary structure topology has near minimum contact energy among all possible geometrically constrained topologies

Secondary structure topology in this article refers to the order and the direction of the secondary structures, such as helices and strands, with respect to the protein sequence. Even when the locations of the secondary structure Cα atoms are known, there are still (N!2N)(M!2M) different possible topologies for a protein with N helices and M strands. This work explored the question if the native topology is likely to be identified among a large set of all possible geometrically constrained topologies through an evaluation of the residue contact energy formed by the secondary structures, instead of the entire chain. We developed a contact pair specific and distance specific multiwell function based on the statistical characterization of the side chain distances of 413 proteins in the Protein Data Bank. The multiwell function has specific parameters to each of the 210 pairs of residue contacts. We illustrated a general mathematical method to extend a single well function to a multiwell function to represent the statistical data. We have performed a mutation analysis using 50 proteins to generate all the possible geometrically constrained topologies of the secondary structures. The result shows that the native topology is within the top 25% of the list ranked by the effective contact energies of the secondary structures for all the 50 proteins, and is within the top 5% for 34 proteins. As an application, the method was used to derive the structure of the skeletons from a low resolution density map that can be obtained through electron cryomicroscopy. Proteins 2009. © 2009 Wiley‐Liss, Inc.

[1]  Enrico Pontelli,et al.  Identification of alpha-helices from low resolution protein density maps. , 2006, Computational systems bioinformatics. Computational Systems Bioinformatics Conference.

[2]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[3]  Matthew L. Baker,et al.  Ab Initio Modeling of the Herpesvirus VP26 Core Domain Assessed by CryoEM Density , 2006, PLoS Comput. Biol..

[4]  Satoru Kuhara,et al.  Application of a deductive database system to search for topological and similar three-dimensional structures in protein , 1997, Comput. Appl. Biosci..

[5]  M. Levitt,et al.  Energy functions that discriminate X-ray and near native folds from well-constructed decoys. , 1996, Journal of molecular biology.

[6]  R D Appel,et al.  Improving protein identification from peptide mass fingerprinting through a parameterized multi‐level scoring algorithm and an optimized peak detection , 1999, Electrophoresis.

[7]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[8]  Jianpeng Ma,et al.  Determining protein topology from skeletons of secondary structures. , 2005, Journal of molecular biology.

[9]  W. Chiu,et al.  Seeing GroEL at 6 A resolution by single particle electron cryomicroscopy. , 2004, Structure.

[10]  H. Scheraga,et al.  Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins. , 1976, Macromolecules.

[11]  Joachim Selbig,et al.  Analysis of protein sheet topologies by graph theoretical methods , 1992, Proteins.

[12]  P. Yu Multicomponent Peak Modeling of Protein Secondary Structures: Comparison of Gaussian with Lorentzian Analytical Methods for Plant Feed and Seed Molecular Biology and Chemistry Research , 2005, Applied spectroscopy.

[13]  R. Jernigan,et al.  Inter-residue potentials in globular proteins and the dominance of highly specific hydrophilic interactions at close separation. , 1997, Journal of molecular biology.

[14]  Yonggang Lu,et al.  Deriving Topology and Sequence Alignment for the Helix Skeleton in Low-Resolution protein Density Maps , 2008, J. Bioinform. Comput. Biol..

[15]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[16]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[17]  Michael J. Geisow Atlas of protein side chain interactions, vols 1 and 2: by Juswinder Singh and Janet M. Thornton, IRL Press at Oxford University Press, 1992. UK£55.00 (428 + 827 pages) ISBN 0 19 963362 2 , 1994 .

[18]  J. R. Torres-Lapasió,et al.  Automatic program for peak detection and deconvolution of multi-overlapped chromatographic signals part II: peak model and deconvolution algorithms. , 2005, Journal of chromatography. A.

[19]  Christopher J. Rawlings,et al.  Reasoning about protein topology using the logic programming language PROLOG , 1985 .

[20]  W. Chiu,et al.  Seeing the herpesvirus capsid at 8.5 A. , 2000, Science.

[21]  W Chiu,et al.  EMAN: semiautomated software for high-resolution single-particle reconstructions. , 1999, Journal of structural biology.

[22]  Aoife McLysaght,et al.  Porter: a new, accurate server for protein secondary structure prediction , 2005, Bioinform..

[23]  Jianpeng Ma,et al.  A structural-informatics approach for mining beta-sheets: locating sheets in intermediate-resolution density maps. , 2003, Journal of molecular biology.

[24]  J. Richardson,et al.  The penultimate rotamer library , 2000, Proteins.

[25]  Stefan Kramer,et al.  A new representation for protein secondary structure prediction based on frequent patterns , 2006, Bioinform..

[26]  M. Baker,et al.  Bridging the information gap: computational tools for intermediate resolution structure interpretation. , 2001, Journal of molecular biology.

[27]  D T Jones,et al.  Classifying a protein in the CATH database of domain structures. , 1998, Acta crystallographica. Section D, Biological crystallography.

[28]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[29]  D. Baker,et al.  Native protein sequences are close to optimal for their structures. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[30]  D. Flower FOLD: integrated analysis and display of protein secondary structure. , 1995, Journal of molecular graphics.

[31]  D R Flower Automating the identification and analysis of protein beta-barrels. , 1994, Protein engineering.

[32]  Jing He,et al.  Incorporating constraints from low resolution density map in ab initio structure prediction using Rosetta , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[33]  G. Crippen,et al.  Contact potential that recognizes the correct folding of globular proteins. , 1992, Journal of molecular biology.

[34]  A A Mironov,et al.  Interhelical contacts determining the architecture of alpha-helical globular proteins. , 1994, Journal of biomolecular structure & dynamics.

[35]  Jianpeng Ma,et al.  A Structural-informatics approach for tracing beta-sheets: building pseudo-C(alpha) traces for beta-strands in intermediate-resolution density maps. , 2004, Journal of molecular biology.

[36]  D. Flower Beta-sheet topology. A new system of nomenclature. , 1994, FEBS letters.

[37]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[38]  Thomas Lengauer,et al.  Detection of Distant Structural Similarities in a Set of Proteins Using a Fast Graph-Based Method , 1997, ISMB.

[39]  M. Baker,et al.  Identification of secondary structure elements in intermediate-resolution density maps. , 2007, Structure.

[40]  V. Cerný Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[41]  Ben M. Webb,et al.  Protein structure fitting and refinement guided by cryo-EM density. , 2008, Structure.

[42]  D. Covell,et al.  Conformations of folded proteins in restricted spaces. , 1990, Biochemistry.

[43]  Darren R. Flower β‐Sheet topology A new system of nomenclature , 1994 .

[44]  J. R. Torres-Lapasió,et al.  Peak deconvolution in one-dimensional chromatography using a two-way data approach. , 2002, Journal of chromatography. A.

[45]  P. Cardot,et al.  A fully automated chromatographic peak detection and treatment software for multi-user multi-task computers. , 1990, Journal of pharmaceutical and biomedical analysis.

[46]  R. Friesner,et al.  Computer modeling of protein folding: conformational and energetic analysis of reduced and detailed protein models. , 1995, Journal of molecular biology.

[47]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[48]  Matthew L. Baker,et al.  Electron cryomicroscopy and bioinformatics suggest protein fold models for rice dwarf virus , 2001, Nature Structural Biology.

[49]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[50]  Jing He,et al.  IDENTIFICATION OF α-HELICES FROM LOW RESOLUTION PROTEIN DENSITY MAPS , 2006 .

[51]  Jianpeng Ma,et al.  A Structural-informatics approach for tracing beta-sheets: building pseudo-C(alpha) traces for beta-strands in intermediate-resolution density maps. , 2004, Journal of molecular biology.

[52]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[53]  M. Baker,et al.  Refinement of protein structures by iterative comparative modeling and CryoEM density fitting. , 2006, Journal of molecular biology.

[54]  T. P. Flores,et al.  Protein structural topology: Automated analysis and diagrammatic representation , 2008, Protein science : a publication of the Protein Society.

[55]  J. Thornton,et al.  Atlas of protein side-chain interactions , 1992 .

[56]  H. Scheraga,et al.  Packing helices in proteins by global optimization of a potential energy function , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Frances M. G. Pearl,et al.  The CATH protein family database: A resource for structural and functional annotation of genomes , 2002, Proteomics.

[58]  Thomas Lengauer,et al.  An Algorithm for Finding Maximal Common Subtopologies in a Set of Protein Structures , 1996, J. Comput. Biol..