Mining residue contacts in proteins using local structure predictions

In this paper we develop data mining techniques to predict 3D contact potentials among protein residues (or amino acids) based on the hierarchical nucleation-propagation model of protein folding. We apply a hybrid approach, using a hidden Markov model to extract folding initiation sites, and then apply association mining to discover contact potentials. The new hybrid approach achieves accuracy results better than those reported previously.

[1]  E V Koonin,et al.  Estimating the number of protein folds and families from complete genome data. , 2000, Journal of molecular biology.

[2]  M J Sippl,et al.  Helmholtz free energy of peptide hydrogen bonds in proteins. , 1996, Journal of molecular biology.

[3]  C. Sander,et al.  The prediction of protein contacts from multiple sequence alignments. , 1996, Protein engineering.

[4]  L Serrano,et al.  The folding of an enzyme. III. Structure of the transition state for unfolding of barnase analysed by a protein engineering procedure. , 1992, Journal of molecular biology.

[5]  A. Kolinski,et al.  Derivation of protein‐specific pair potentials based on weak sequence fragment similarity , 2000, Proteins.

[6]  V. Thorsson,et al.  HMMSTR: a hidden Markov model for local sequence-structure correlations in proteins. , 2000, Journal of molecular biology.

[7]  D Baker,et al.  Global properties of the mapping between local amino acid sequence and local structure in proteins. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[8]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[9]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[10]  Julie D. Forman-Kay,et al.  NOE data demonstrating a compact unfolded state for an SH3 domain under non-denaturing conditions. , 1999 .

[11]  D Baker,et al.  Prediction and structural characterization of an independently folding substructure in the src SH3 domain. , 1998, Journal of molecular biology.

[12]  H. Roder,et al.  Kinetic intermediates in the formation of the cytochrome c molten globule , 1996, Nature Structural Biology.

[13]  C Kooperberg,et al.  Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. , 1997, Journal of molecular biology.

[14]  D. Baker,et al.  Prediction of local structure in proteins using a library of sequence-structure motifs. , 1998, Journal of molecular biology.

[15]  G Schreiber,et al.  The folding pathway of a protein at high resolution from microseconds to seconds. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[16]  A. Valencia,et al.  Improving contact predictions by the combination of correlated mutations and other sources of sequence information. , 1997, Folding & design.

[17]  Heikki Mannila,et al.  Verkamo: Fast Discovery of Association Rules , 1996, KDD 1996.

[18]  M Vendruscolo,et al.  Recovery of protein structure from contact maps. , 1997, Folding & design.

[19]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[20]  U. Hobohm,et al.  Enlarged representative set of protein structures , 1994, Protein science : a publication of the Protein Society.

[21]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[22]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.

[23]  R. Casadio,et al.  A neural network based predictor of residue contacts in proteins. , 1999, Protein engineering.

[24]  B. Honig Protein folding: from the levinthal paradox to structure prediction. , 1999, Journal of molecular biology.

[25]  S H Kim,et al.  Environment-dependent residue contact energies for proteins. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[27]  L. Kay,et al.  NOE data demonstrating a compact unfolded state for an SH3 domain under non-denaturing conditions. , 1999, Journal of molecular biology.

[28]  S. Bryant Evaluation of threading specificity and accuracy , 1996, Proteins.

[29]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[30]  Mohammed J. Zaki,et al.  Mining Protein Contact Maps , 2002, BIOKDD.

[31]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[32]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.