The automatic discovery of structural principles describing protein fold space.

The study of protein structure has been driven largely by the careful inspection of experimental data by human experts. However, the rapid determination of protein structures from structural-genomics projects will make it increasingly difficult to analyse (and determine the principles responsible for) the distribution of proteins in fold space by inspection alone. Here, we demonstrate a machine-learning strategy that automatically determines the structural principles describing 45 folds. The rules learnt were shown to be both statistically significant and meaningful to protein experts. With the increasing emphasis on high-throughput experimental initiatives, machine-learning and other automated methods of analysis will become increasingly important for many biological problems.

[1]  P. Terpstra,et al.  Prediction of the Occurrence of the ADP-binding βαβ-fold in Proteins, Using an Amino Acid Sequence Fingerprint , 1986 .

[2]  P Bork,et al.  The immunoglobulin fold. Structural classification, sequence patterns and common core. , 1994, Journal of molecular biology.

[3]  David T. Jones,et al.  Protein superfamilles and domain superfolds , 1994, Nature.

[4]  Jonathan D. Hirst,et al.  Quantitative structure-activity relationships by neural networks and inductive logic programming. I. The inhibition of dihydrofolate reductase by pyrimidines , 1994, J. Comput. Aided Mol. Des..

[5]  William R. Taylor,et al.  A ‘periodic table’ for protein structures , 2002, Nature.

[6]  J. Thornton,et al.  PROMOTIF—A program to identify and analyze structural motifs in proteins , 1996, Protein science : a publication of the Protein Society.

[7]  Nozomi Nagano,et al.  Barrel structures in proteins: Automatic identification and classification including a sequence analysis of TIM barrels , 1999, Protein science : a publication of the Protein Society.

[8]  C. Chothia One thousand families for the molecular biologist , 1992, Nature.

[9]  James E. Bray,et al.  Assigning genomic sequences to CATH , 2000, Nucleic Acids Res..

[10]  R. King,et al.  On the use of machine learning to identify topological rules in the packing of β-strands , 1994 .

[11]  M J Sternberg,et al.  On the use of machine learning to identify topological rules in the packing of beta-strands. , 1994, Protein engineering.

[12]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[13]  Stephen Muggleton,et al.  Automatic determination of protein fold signatures from structural superpositions , 2001, Electron. Trans. Artif. Intell..

[14]  Tim J. P. Hubbard,et al.  SCOP: a structural classification of proteins database , 1998, Nucleic Acids Res..

[15]  W R Taylor,et al.  Protein structure alignment. , 1989, Journal of molecular biology.

[16]  Patrice Koehl,et al.  The ASTRAL compendium for protein structure and sequence analysis , 2000, Nucleic Acids Res..

[17]  D T Jones,et al.  A systematic comparison of protein structure classifications: SCOP, CATH and FSSP. , 1999, Structure.

[18]  Richard A. Lewis,et al.  Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[19]  S. Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning. , 1992, Protein engineering.

[20]  Chris Sander,et al.  Touring protein fold space with Dali/FSSP , 1998, Nucleic Acids Res..

[21]  M J Sternberg,et al.  Automated discovery of structural signatures of protein fold and function. , 2001, Journal of molecular biology.

[22]  M. Sternberg,et al.  Enhanced genome annotation using structural profiles in the program 3D-PSSM. , 2000, Journal of molecular biology.

[23]  M J Sternberg,et al.  Structure-activity relationships derived by machine learning: the use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. , 1996, Proceedings of the National Academy of Sciences of the United States of America.