A decision graph explanation of protein secondary structure prediction

The machine-learning technique of decision graphs, a generalization of decision trees, is applied to the prediction of protein secondary structure to infer a theory for this problem. The resulting decision graph provides both a prediction method and an explanation for the problem. Many decision graphs are possible for the problem. A particular graph is just one theory or hypothesis of secondary structure formation. Minimum message length encoding is used to judge the quality of different theories. It is a general technique of inductive inference and is resistant to learning the noise in the training data. The method was applied to 75 sequences from nonhomologous proteins comprising 13 K amino acids. The predictive accuracy for three states (extended, helix, other) was in the range achieved by current methods.<<ETX>>

[1]  M J Sternberg,et al.  Machine learning approach for the prediction of protein secondary structure. , 1990, Journal of molecular biology.

[2]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[3]  M. A. Jiménez-Montaño,et al.  On the syntactic structure of protein sequences and the concept of grammar complexity , 1984 .

[4]  C. S. Wallace,et al.  Classification by Minimum-Message-Length Inference , 1991, ICCI.

[5]  J. Gibrat,et al.  Further developments of protein secondary structure prediction using information theory. New parameters and consideration of residue pairs. , 1987, Journal of molecular biology.

[6]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[7]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[8]  Stephen Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning , 1992 .

[9]  Stephen Muggleton,et al.  Using logic for protein structure prediction , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[10]  Jean Garnier,et al.  The protein structure code: what is its present status? , 1991, Comput. Appl. Biosci..

[11]  W. Kabsch,et al.  How good are predictions of protein secondary structure? , 1983, FEBS letters.

[12]  S. Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning. , 1992, Protein engineering.

[13]  B. Robson,et al.  Conformational properties of amino acid residues in globular proteins. , 1976, Journal of molecular biology.

[14]  V. Lim Algorithms for prediction of α-helical and β-structural regions in globular proteins , 1974 .

[15]  P. Y. Chou,et al.  Prediction of protein conformation. , 1974, Biochemistry.

[16]  Gregory J. Chaitin,et al.  On the Length of Programs for Computing Finite Binary Sequences , 1966, JACM.

[17]  J. Garnier,et al.  The GOR Method for Predicting Secondary Structures in Proteins , 1989 .

[18]  G. Schulz,et al.  A critical evaluation of methods for prediction of protein secondary structures. , 1988, Annual review of biophysics and biophysical chemistry.

[19]  Richard H. Lathrop,et al.  ARIADNE: pattern-directed inference and hierarchical abstraction in protein structure recognition , 1987, CACM.

[20]  Ronald L. Rivest,et al.  Inferring Decision Trees Using the Minimum Description Length Principle , 1989, Inf. Comput..

[21]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[22]  W. Taylor,et al.  The classification of amino acid conservation. , 1986, Journal of theoretical biology.

[23]  Constantino Tsallis,et al.  Optimization by Simulated Annealing: Recent Progress , 1995 .

[24]  M. Bishop,et al.  Nucleic acid and protein sequence analysis : a practical approach , 1987 .

[25]  Elliott Cj Analysis and prediction of protein structure. , 1995 .

[26]  Jr. George N. Reeke Protein folding: computational approaches to an exponential-time problem , 1988 .

[27]  R. Swanson A unifying concept for the amino acid code. , 1984, Bulletin of mathematical biology.