Protein Attributes Microtuning System (PAMS): an effective tool to increase protein structure prediction by data purification

Given the expense of more direct determinations, using machine-learning schemes to predict a protein secondary structure from the sequence alone remains an important methodology. To achieve significant improvements in prediction accuracy, the authors have developed an automated tool to prepare very large biological datasets, to be used by the learning network. By focusing on improvements in data quality and validation, our experiments yielded a highest prediction accuracy of protein secondary structure of 90.97%. An important additional aspect of this achievement is that the predictions are based on a template-free statistical modeling mechanism. The performance of each different classifier is also evaluated and discussed. In this paper a protein set of 232 protein chains are proposed to be used in the prediction. Our goal is to make the tools discussed available as services in part of a digital ecosystem that supports knowledge sharing amongst the protein structure prediction community.

[1]  B. Rost Review: protein secondary structure prediction continues to rise. , 2001, Journal of structural biology.

[2]  David T. Jones,et al.  Prediction of disordered regions in proteins from position specific score matrices , 2003, Proteins.

[3]  G J Barton,et al.  Application of multiple sequence alignment profiles to improve protein secondary structure prediction , 2000, Proteins.

[4]  A Chinnasamy,et al.  Protein structure and fold prediction using tree-augmented naive Bayesian classifier. , 2004, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[5]  B. Rost,et al.  Redefining the goals of protein secondary structure prediction. , 1994, Journal of molecular biology.

[6]  D. Eisenberg,et al.  A method to identify protein sequences that fold into a known three-dimensional structure. , 1991, Science.

[7]  Remco R. Bouckaert,et al.  Bayesian network classifiers in Weka , 2004 .

[8]  Andrew E. Torda,et al.  Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices , 2004, Nucleic Acids Res..

[9]  Burkhard Rost,et al.  The PredictProtein server , 2003, Nucleic Acids Res..

[10]  J. Drenth Principles of protein x-ray crystallography , 1994 .

[11]  C. Sigurdson,et al.  [Prion diseases?]. , 1985, Deutsche medizinische Wochenschrift.

[12]  B. Rost,et al.  Alignments grow, secondary structure prediction improves , 2002, Proteins.

[13]  M. Kanehisa,et al.  Cluster analysis of amino acid indices for prediction of protein structure and function. , 1988, Protein engineering.

[14]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[15]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[16]  T. Salakoski,et al.  Selection of a representative set of structures from brookhaven protein data bank , 1992, Proteins.

[17]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[18]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[19]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[20]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[21]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[22]  Geoffrey J. Barton,et al.  JPred : a consensus secondary structure prediction server , 1999 .

[23]  Burkhard Rost,et al.  Sisyphus and prediction of protein structure , 1997, Comput. Appl. Biosci..

[24]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[25]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[26]  U. Hobohm,et al.  Selection of representative protein data sets , 1992, Protein science : a publication of the Protein Society.

[27]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[28]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[29]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[30]  E. Brunt,et al.  Metabolic storage diseases: amyloidosis. , 2004, Clinics in liver disease.