Amino Acid Residue Environments and Predictions of Residue Type

The determination of a protein's structure from the knowledge of its linear chain is one of the important problems that remains as a bottleneck in interpreting the rapidly increasing repository of genetic sequence data. One approach to this problem that has shown promise and given a measure of success is threading. In this approach contact energies between different amino acids are first determined by statistical methods applied to known structures. These contact energies are then applied to a sequence whose structure is to be determined by threading it through various known structures and determining the total threading energy for each candidate structure. That structure that yields the lowest total energy is then considered the leading candidate among all the structures tested. Additional information is often needed in order to support the results of threading studies, as it is well known in the field that the contact potentials used are not sufficiently sensitive to allow definitive conclusions. Here, we investigate the hypothesis that the environment of an amino acid residue realized as all those residues not local to it on the chain but sufficiently close spatially can supply information predictive of the type of that residue that is not adequately reflected in the individual contact energies. We present evidence that confirms this hypothesis and suggests a high order cooperativity between the residues that surround a given residue and how they interact with it. We suggest a possible application to threading.

[1]  M J Sippl,et al.  Knowledge-based potentials for proteins. , 1995, Current opinion in structural biology.

[2]  A Godzik,et al.  Knowledge-based potentials for protein folding: what can we learn from known protein structures? , 1996, Structure.

[3]  Stephen E. Robertson,et al.  Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval , 1994, SIGIR '94.

[4]  H. D. Brunk,et al.  AN EMPIRICAL DISTRIBUTION FUNCTION FOR SAMPLING WITH INCOMPLETE INFORMATION , 1955 .

[5]  Y Wang,et al.  A new protein folding recognition potential function , 1995, Proteins.

[6]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[7]  S. Bryant,et al.  An empirical energy function for threading protein sequence through the folding motif , 1993, Proteins.

[8]  A. Finkelstein,et al.  Why do protein architectures have boltzmann‐like statistics? , 1995, Proteins.

[9]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[10]  L. Joseph,et al.  Bayesian Statistics: An Introduction , 1989 .

[11]  M. Sippl Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. , 1990, Journal of molecular biology.

[12]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[13]  A. Panchenko,et al.  Combination of threading potentials and sequence profiles improves fold recognition. , 2000, Journal of molecular biology.

[14]  Manfred J. Sippl,et al.  Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures , 1993, J. Comput. Aided Mol. Des..

[15]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[16]  L A Mirny,et al.  How to derive a protein folding potential? A new approach to an old problem. , 1996, Journal of molecular biology.

[17]  G M Crippen Easily searched protein folding potentials. , 1996, Journal of molecular biology.

[18]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[19]  W. Härdle Smoothing Techniques: With Implementation in S , 1991 .

[20]  Gerald Salton,et al.  Automatic text processing , 1988 .

[21]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[22]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[23]  J Selbig Contact pattern-induced pair potentials for protein fold recognition. , 1995, Protein engineering.

[24]  K. Dill,et al.  Statistical potentials extracted from protein structures: how accurate are they? , 1996, Journal of molecular biology.

[25]  David C. Jones,et al.  GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. , 1999, Journal of molecular biology.

[26]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[27]  D Eisenberg,et al.  How Chaperones Protect Virgin Proteins , 1999, Science.

[28]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[29]  G. Salton,et al.  A Generalized Term Dependence Model in Information Retrieval , 1983 .

[30]  S. Bryant,et al.  Identification of homologous core structures , 1999, Proteins.

[31]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[32]  A. Panchenko,et al.  Threading with explicit models for evolutionary conservation of structure and sequence , 1999, Proteins.

[33]  R. Jernigan,et al.  Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term, for simulation and threading. , 1996, Journal of molecular biology.

[34]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[35]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.