A structure and evolution-guided Monte Carlo sequence selection strategy for multiple alignment-based analysis of proteins

MOTIVATION Various multiple sequence alignment-based methods have been proposed to detect functional surfaces in proteins, such as active sites or protein interfaces. The effect that the choice of sequences has on the conclusions of such analysis has seldom been discussed. In particular, no method has been discussed in terms of its ability to optimize the sequence selection for the reliable detection of functional surfaces. RESULTS Here we propose, for the case of proteins with known structure, a heuristic Metropolis Monte Carlo strategy to select sequences from a large set of homologues, in order to improve detection of functional surfaces. The quantity guiding the optimization is the clustering of residues which are under increased evolutionary pressure, according to the sample of sequences under consideration. We show that we can either improve the overlap of our prediction with known functional surfaces in comparison with the sequence similarity criteria of selection or match the quality of prediction obtained through more elaborate non-structure based-methods of sequence selection. For the purpose of demonstration we use a set of 50 homodimerizing enzymes which were co-crystallized with their substrates and cofactors.

[1]  J. Valverde Molecular Modelling: Principles and Applications , 2001 .

[2]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[3]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[4]  J M Thornton,et al.  Conservation helps to identify biologically relevant crystal contacts. , 2001, Journal of molecular biology.

[5]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[6]  A. Valencia,et al.  Prediction of protein--protein interaction sites in heterocomplexes with neural networks. , 2002, European journal of biochemistry.

[7]  M. Berriman,et al.  The three-dimensional structure of a Plasmodium falciparum cyclophilin in complex with the potent anti-malarial cyclosporin A. , 2000, Journal of molecular biology.

[8]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[9]  Gail J. Bartlett,et al.  Analysis of catalytic residues in enzyme active sites. , 2002, Journal of molecular biology.

[10]  O. Lichtarge,et al.  Combining inference from evolution and geometric probability in protein structure evaluation. , 2003, Journal of molecular biology.

[11]  Olivier Lichtarge,et al.  Correlated evolutionary pressure at interacting transcription factors and DNA response elements can guide the rational engineering of DNA binding specificity. , 2005, Journal of molecular biology.

[12]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[13]  Annabel E. Todd,et al.  Evolution of function in protein superfamilies, from a structural perspective. , 2001, Journal of molecular biology.

[14]  A. Elcock,et al.  Identification of protein oligomerization states by analysis of interface conservation , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  L. Mirny,et al.  Evolutionary conservation of the folding nucleus. , 2000, Journal of molecular biology.

[16]  Daniel R. Caffrey,et al.  Are protein–protein interfaces more conserved in sequence than the rest of the protein surface? , 2004, Protein science : a publication of the Protein Society.

[17]  Kevin W Plaxco,et al.  Residues participating in the protein folding nucleus do not exhibit preferential evolutionary conservation. , 2002, Journal of molecular biology.

[18]  N. Grishin,et al.  The subunit interfaces of oligomeric enzymes are conserved to a similar extent to the overall protein sequences , 1994, Protein science : a publication of the Protein Society.

[19]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[20]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[21]  David R. Westhead,et al.  Improved prediction of protein-protein binding sites using a support vector machines approach. , 2005, Bioinformatics.

[22]  Chris Sander Databases of homology-derived protein structures , 1990 .

[23]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.

[24]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[25]  C. Sander,et al.  Database of homology‐derived protein structures and the structural meaning of sequence alignment , 1991, Proteins.

[26]  Craig A. Stewart,et al.  Introduction to computational biology , 2005 .

[27]  Orkun S. Soyer,et al.  Predicting functional sites in proteins: site-specific evolutionary models and their application to neurotransmitter transporters. , 2004, Journal of molecular biology.

[28]  F E Cohen,et al.  Evolutionarily conserved Galphabetagamma binding surfaces support a model of the G protein-receptor complex. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[29]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[30]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[31]  Alfonso Valencia,et al.  Early bioinformatics: the birth of a discipline - a personal view , 2003, Bioinform..

[32]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[33]  Janet M. Thornton,et al.  PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids , 2004, Nucleic Acids Res..

[34]  O. Lichtarge,et al.  Evolutionary Trace of G Protein-coupled Receptors Reveals Clusters of Residues That Determine Global and Class-specific Functions* , 2004, Journal of Biological Chemistry.