Protein Structure from Contact Maps: A Case-Based Reasoning Approach

Determining the three-dimensional structure of a protein is an important step in understanding biological function. Despite advances in experimental methods (crystallography and NMR) and protein structure prediction techniques, the gap between the number of known protein sequences and determined structures continues to grow.Approaches to protein structure prediction vary from those that apply physical principles to those that consider known amino acid sequences and previously determined protein structures. In this paper we consider a two-step approach to structure prediction: (1) predict contacts between amino acids using sequence data; (2) predict protein structure using the predicted contact maps. Our focus is on the second step of this approach. In particular, we apply a case-based reasoning framework to determine the alignment of secondary structures based on previous experiences stored in a case base, along with detailed knowledge of the chemical and physical properties of proteins. Case-based reasoning is founded on the premise that similar problems have similar solutions. Our hypothesis is that we can use previously determined structures and their contact maps to predict the structure for novel proteins from their contact maps.The paper presents an overview of contact maps along with the general principles behind our methodology of case-based reasoning. We discuss details of the implementation of our system and present empirical results using contact maps retrieved from the Protein Data Bank.

[1]  Christopher K. Riesbeck,et al.  Inside Case-Based Reasoning , 1989 .

[2]  Lindley Darden,et al.  Protein Sequencing Experiment Planning Using Analogy , 1993, ISMB.

[3]  A. Gronenborn,et al.  Determination of three‐dimensional structures of proteins from interproton distance data by hybrid distance geometry‐dynamical simulated annealing calculations , 1988, FEBS letters.

[4]  Gianluca Pollastri,et al.  Prediction of Contact Maps by Recurrent Neural Network Architectures and Hidden Context Propagation From All Four Cardinal Corners , 2002 .

[5]  Igor Jurisica,et al.  Intelligent decision support for protein crystal growth , 2001, IBM Syst. J..

[6]  C. Won,et al.  Efficient Use of MPEG‐7 Edge Histogram Descriptor , 2002 .

[7]  Igor Jurisica,et al.  Applications of Case-Based Reasoning in Molecular Biology , 2004, AI Mag..

[8]  Jude Shavlik,et al.  Finding Genes by Case-Based Reasoning in the Presence of Noisy Case Boundaries * , 1991 .

[9]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[10]  G J Kleywegt,et al.  Model building and refinement practice. , 1997, Methods in enzymology.

[11]  G. Christian Overton,et al.  Knowledge Discovery in GENBANK , 1993, ISMB.

[12]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[13]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[14]  Shih-Fu Chang,et al.  Quad-tree segmentation for texture-based image query , 1994, MULTIMEDIA '94.

[15]  Gary J. Sullivan,et al.  Efficient quadtree coding of images and video , 1994, IEEE Trans. Image Process..

[16]  D. Eisenberg,et al.  Assessment of protein models with three-dimensional profiles , 1992, Nature.

[17]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[18]  Susan L. Epstein For the Right Reasons: The FORR Architecture for Learning in a Skill Domain , 1994, Cogn. Sci..

[19]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[20]  Susan L. Epstein Pragmatic Navigation: Reactivity, Heuristics, and Search , 1998, Artif. Intell..

[21]  Bruce G. Buchanan,et al.  Protein Secondary Structure Prediction Using Two-Level Case-Based Reasoning , 1993, ISMB.

[22]  Susan L. Epstein,et al.  Learning Game-Specific Spatially-Oriented Heuristics , 1998, Constraints.

[23]  Burkhard Rost,et al.  PROFcon: novel prediction of long-range contacts , 2005, Bioinform..

[24]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[25]  A. Gronenborn,et al.  Determination of three‐dimensional structures of proteins from interproton distance data by dynamical simulated annealing from a random array of atoms Circumventing problems associated with folding , 1988, FEBS letters.

[26]  D Hennessy,et al.  Statistical methods for the objective design of screening procedures for macromolecular crystallization. , 2000, Acta crystallographica. Section D, Biological crystallography.