Quality Assessment of Protein Models

Quality assessment (QA) is to judge the quality of a protein model without knowing its native structure. It plays an important role in the process of protein structure prediction by guiding us to select an appropriate combination of templates out of many possibilities [1]. We formulate the QA problem as mapping and regression. Protein models are mapped into R n space by extracting n features from each model, and the space is divided into subspaces according to the feature values in order to represent models’ closeness (between 0 and 1) to the native structure. For the feature values, we measure the degree of agreement between the predicted property from the sequence analysis and the calculated one from the 3D model. Properties that we use include the secondary structure, solvent accessibility, hydrophobicity, and energy components from MODELLER [2], DFIRE [3], and TASSER [4]. One advantage of using the consensustype feature is that models with similar quality are prone to be clustered together in R n space, and thus they tend to be well classified. Assigning closeness values to regions of R n space is carried out using decoy structures generated during the CASP7 with known native structures [1] by applying Support Vector Machine regression techniques (SVM). SVM that we used has the character of non-linearity which can be applied to the linearly non-separable cases. In other words, for given features, SVM is more efficient than the linear method such as linear programming. The proposed method was used to select final models during the CASP8 prediction. We will discuss both the usefulness and the limitation of the method for better protein modeling.

[1]  A. Elofsson,et al.  Can correct protein models be identified? , 2003, Protein science : a publication of the Protein Society.

[2]  J F Gibrat,et al.  Surprising similarities in structure comparison. , 1996, Current opinion in structural biology.

[3]  Neal J. Zondlo Non-covalent interactions: Fold globally, bond locally. , 2010, Nature chemical biology.

[4]  D. Rubinsztein,et al.  Polyalanine and polyserine frameshift products in Huntington’s disease , 2006, Journal of Medical Genetics.

[5]  Björn Wallner,et al.  Model quality assessment for membrane proteins , 2010, Bioinform..

[6]  Andrej Sali,et al.  Comparative Protein Structure Modeling and its Applications to Drug Discovery , 2004 .

[7]  Massimiliano Pontil,et al.  PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments , 2012, Bioinform..

[8]  B. de Kruijff,et al.  Lipid polymorphism and the functional roles of lipids in biological membranes. , 1979, Biochimica et biophysica acta.

[9]  Jonathan Pevsner,et al.  Basic Local Alignment Search Tool (BLAST) , 2005 .

[10]  Yang Zhang,et al.  Structure Modeling of All Identified G Protein–Coupled Receptors in the Human Genome , 2006, PLoS Comput. Biol..

[11]  J. Kendrew,et al.  A Three-Dimensional Model of the Myoglobin Molecule Obtained by X-Ray Analysis , 1958, Nature.

[12]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[13]  M. Karplus,et al.  Crystallographic R Factor Refinement by Molecular Dynamics , 1987, Science.

[14]  R Abagyan,et al.  A new method for modeling large‐scale rearrangements of protein domains , 1997, Proteins.

[15]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[16]  Jens Meiler,et al.  CASP6 assessment of contact prediction , 2005, Proteins.

[17]  Golan Yona,et al.  Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. , 2002, Journal of molecular biology.

[18]  Kaj Linderstrøm-Lang,et al.  Lane medical lectures : Proteins and enzymes , 2016 .

[19]  Ingo Brigandt,et al.  Homology and the origin of correspondence , 2002 .

[20]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[21]  T Reichhardt,et al.  It's sink or swim as a tidal wave of data approaches , 1999, Nature.

[22]  Scott R. Presnell,et al.  Artificial neural networks for pattern recognition in biochemical sequences. , 1993, Annual review of biophysics and biomolecular structure.

[23]  J. Parkhill,et al.  Comparative genomic structure of prokaryotes. , 2004, Annual review of genetics.

[24]  Pauline Hogeweg,et al.  Simulating the growth of cellular forms , 1978 .

[25]  Mihalis Yannakakis,et al.  On the Complexity of Protein Folding , 1998, J. Comput. Biol..

[26]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[27]  Sheena E Radford,et al.  Intermediates: ubiquitous species on folding energy landscapes? , 2007, Current opinion in structural biology.

[28]  K. Dill,et al.  From Levinthal to pathways to funnels , 1997, Nature Structural Biology.

[29]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[30]  R. Unger,et al.  Finding the lowest free energy conformation of a protein is an NP-hard problem: proof and implications. , 1993, Bulletin of mathematical biology.

[31]  Ming Li,et al.  Consensus fold recognition by predicted model quality , 2005, APBC.

[32]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[33]  P. Argos,et al.  Rotamers: to be or not to be? An analysis of amino acid side-chain conformations in globular proteins. , 1993, Journal of molecular biology.

[34]  Paulien Hogeweg,et al.  The Roots of Bioinformatics in Theoretical Biology , 2011, PLoS Comput. Biol..

[35]  A. Godzik,et al.  Comparison of sequence profiles. Strategies for structural predictions using sequence information , 2008, Protein science : a publication of the Protein Society.

[36]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[37]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[38]  Ming-Jing Hwang,et al.  Protein structure comparison by probability-based matching of secondary structure elements , 2003, Bioinform..

[39]  S. Singer,et al.  The fluid mosaic model of the structure of cell membranes. , 1972, Science.

[40]  Dieter Jahn,et al.  Combination of a data warehouse concept with web services for the establishment of the Pseudomonas systems biology database SYSTOMONAS , 2007, J. Integr. Bioinform..

[41]  C. H. Walker The Hydrophobic Effect: Formation of Micelles and Biological Membranes , 1981 .

[42]  Terri K. Attwood,et al.  The PRINTS Database: A Resource for Identification of Protein Families , 2002, Briefings Bioinform..

[43]  A. Sali,et al.  Protein Structure Prediction and Structural Genomics , 2001, Science.

[44]  Sorin Draghici,et al.  Machine Learning and Its Applications to Biology , 2007, PLoS Comput. Biol..

[45]  P. Ponganis,et al.  The physiological basis of diving to depth: birds and mammals. , 1998, Annual review of physiology.

[46]  R. Jakob,et al.  Energetic coupling between native-state prolyl isomerization and conformational protein folding. , 2008, Journal of molecular biology.

[47]  Richard Bonneau,et al.  Ab initio protein structure prediction of CASP III targets using ROSETTA , 1999, Proteins.

[48]  B. Barnhart,et al.  The Department of Energy (DOE) Human Genome Initiative. , 1989, Genomics.

[49]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[50]  J. Onuchic,et al.  Theory of protein folding: the energy landscape perspective. , 1997, Annual review of physical chemistry.

[51]  David Baker,et al.  Protein Structure Prediction Using Rosetta , 2004, Numerical Computer Methods, Part D.

[52]  Georges Trinquier,et al.  Estimating the "steric clash" at cis peptide bonds. , 2008, The journal of physical chemistry. B.

[53]  J Lundström,et al.  Pcons: A neural‐network–based consensus predictor that improves fold recognition , 2001, Protein science : a publication of the Protein Society.

[54]  C. Sander,et al.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[55]  K. Burrage,et al.  Protein contact prediction using patterns of correlation , 2004, Proteins.

[56]  C. Levinthal Are there pathways for protein folding , 1968 .

[57]  Yu-Dong Cai,et al.  Support Vector Machines for predicting protein structural class , 2001, BMC Bioinformatics.

[58]  Yuzuru Suzuki,et al.  A strong correlation between the increase in number of proline residues and the rise in thermostability of five Bacillus oligo-1,6-glucosidases , 1987, Applied Microbiology and Biotechnology.

[59]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[60]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[61]  C Sander,et al.  Mapping the Protein Universe , 1996, Science.

[62]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[63]  C. Goose,et al.  Glossary of Terms , 2004, Machine Learning.

[64]  A. Chernov,et al.  Protein crystals and their growth. , 2003, Journal of structural biology.

[65]  Keehyoung Joo,et al.  High accuracy template based modeling by global optimization , 2007, Proteins.

[66]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[67]  G. Stormo,et al.  Correlated mutations in models of protein sequences: phylogenetic and structural effects , 1999 .

[68]  D. Mount Bioinformatics: Sequence and Genome Analysis , 2001 .

[69]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[70]  Roland L. Dunbrack Rotamer libraries in the 21st century. , 2002, Current opinion in structural biology.

[71]  Marcin J. Skwark,et al.  Improved predictions by Pcons.net using multiple templates , 2011, Bioinform..

[72]  M. Gerstein,et al.  What is bioinformatics ? An introduction and overview , 2001 .

[73]  Ingo Brigandt,et al.  The importance of homology for biology and philosophy , 2007 .

[74]  W S McCulloch,et al.  A logical calculus of the ideas immanent in nervous activity , 1990, The Philosophy of Artificial Intelligence.

[75]  Yang Zhang Protein structure prediction: when is it useful? , 2009, Current opinion in structural biology.

[76]  Junwen Wang,et al.  Predictive models for protein crystallization. , 2004, Methods.

[77]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[78]  G. N. Ramachandran,et al.  Stereochemistry of polypeptide chain configurations. , 1963, Journal of molecular biology.

[79]  M. Luckey,et al.  Membrane Structural Biology: With Biochemical and Biophysical Foundations , 2008 .

[80]  Yang Zhang,et al.  I-TASSER server for protein 3D structure prediction , 2008, BMC Bioinformatics.

[81]  Ron Shamir,et al.  Artificial Intelligence and Heuristic Methods in Bioinformatics , 2003 .

[82]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[83]  Angela E. Douglas On the Origin of the Eukaryotes , 2012 .

[84]  Roland L. Dunbrack,et al.  Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. , 1997, Journal of molecular biology.

[85]  R. Stein Mechanism of enzymatic and nonenzymatic prolyl cis-trans isomerization. , 1993, Advances in protein chemistry.

[86]  K. Fidelis,et al.  Protein structure prediction and model quality assessment. , 2009, Drug discovery today.

[87]  Florian Markowetz,et al.  Support Vector Machines in Bioinformatics , 2002 .

[88]  G. P. Moss Basic terminology of stereochemistry (IUPAC Recommendations 1996) , 1996 .

[89]  L. Pauling,et al.  Configurations of Polypeptide Chains With Favored Orientations Around Single Bonds: Two New Pleated Sheets. , 1951, Proceedings of the National Academy of Sciences of the United States of America.

[90]  B. Rost,et al.  Combining evolutionary information and neural networks to predict protein secondary structure , 1994, Proteins.

[91]  J. Onuchic,et al.  Investigation of routes and funnels in protein folding by free energy functional methods. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[92]  Hongyi Zhou,et al.  Distance‐scaled, finite ideal‐gas reference state improves structure‐derived potentials of mean force for structure selection and stability prediction , 2002, Protein science : a publication of the Protein Society.

[93]  Joe Marks,et al.  Computational Complexity, Protein Structure Prediction, and the Levinthal Paradox , 1994 .

[94]  N. A. Solov'eva,et al.  Structures and Functions of Chaperones and Chaperonins (Review) , 2004, Applied Biochemistry and Microbiology.

[95]  Richard Bonneau,et al.  Ab initio protein structure prediction: progress and prospects. , 2001, Annual review of biophysics and biomolecular structure.

[96]  M. Gribskov,et al.  Identification of Sequence Patterns with Profile Analysis , 1996 .

[97]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[98]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[99]  I. Weinstein,et al.  Fidelity in protein synthesis: proline miscoding in a thermophile system. , 1965, Biochemical and biophysical research communications.

[100]  Zhirong Sun,et al.  Support vector machine approach for protein subcellular localization prediction , 2001, Bioinform..

[101]  David T. Jones,et al.  Protein topology from predicted residue contacts , 2012, Protein science : a publication of the Protein Society.

[102]  Roland L. Dunbrack,et al.  Conformational analysis of the backbone-dependent rotamer preferences of protein sidechains , 1994, Nature Structural Biology.

[103]  Eugene I Shakhnovich,et al.  Understanding protein evolution: from protein physics to Darwinian selection. , 2008, Annual review of physical chemistry.

[104]  M J Sternberg,et al.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks. , 1992, Biochemistry.

[105]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[106]  Arthur M. Lesk,et al.  Quantitative sequence-function relationships in proteins based on gene ontology , 2007, BMC Bioinformatics.

[107]  Andreas Möglich,et al.  Effect of proline and glycine residues on dynamics and barriers of loop formation in polypeptide chains. , 2005, Journal of the American Chemical Society.

[108]  David A. Fenstermacher,et al.  Introduction to bioinformatics , 2005, J. Assoc. Inf. Sci. Technol..