Learning and alignment methods applied to protein structure prediction.

Learning techniques are able to extract structural knowledge specific to a selected set of proteins. We describe two algorithms that optimize scores expressing the propensity of a polypeptide sequence to adopt a local fold. The first algorithm generates secondary structure prediction rules based on a dictionary of geometrical patterns frequently found in the learning database. The second algorithm leads to scores that indicate the fit between an amino acid and a given local structural environment. Dynamic programming is then used to align structural information profiles by modifying the local mutation cost with the above learned functions. The main features of the system are exemplified on the structural prediction of the N-terminal domain of the CD4 antigen. Then the usefulness of additional 3-D information in the alignment is benchmarked on eight pairs of weakly homologous proteins.

[1]  Thomas P. J. Garrett,et al.  Atomic structure of a fragment of human CD4 containing two immunoglobulin-like domains , 1990, Nature.

[2]  M. Zuker Suboptimal sequence alignment in molecular biology. Alignment with error analysis. , 1991, Journal of molecular biology.

[3]  A. F. Williams,et al.  The immunoglobulin superfamily--domains for cell surface recognition. , 1988, Annual review of immunology.

[4]  J. Garnier,et al.  Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. , 1978, Journal of molecular biology.

[5]  Alexei V. Finkelstein,et al.  A search for the most stable folds of protein chains , 1991, Nature.

[6]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[7]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[8]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1977, Journal of molecular biology.

[9]  M. Karplus,et al.  Protein secondary structure prediction with a neural network. , 1989, Proceedings of the National Academy of Sciences of the United States of America.

[10]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[11]  Richard H. Lee Protein model building using structural homology , 1992, Nature.

[12]  S. Benner,et al.  Patterns of divergence in homologous proteins as indicators of secondary and tertiary structure: a prediction of the structure of the catalytic domain of protein kinases. , 1991, Advances in enzyme regulation.

[13]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.