Using multiple sequence correlation analysis to characterize functionally important protein regions.

Protein co-evolution under structural and functional constraints necessitates the preservation of important interactions. Identifying functionally important regions poses many obstacles in protein engineering efforts. In this paper, we present a bioinformatics-inspired approach (residue correlation analysis, RCA) for predicting functionally important domains from protein family sequence data. RCA is comprised of two major steps: (i) identifying pairs of residue positions that mutate in a coordinated manner, and (ii) using these results to identify protein regions that interact with an uncommonly high number of other residues. We hypothesize that strongly correlated pairs result not only from contacting pairs, but also from residues that participate in conformational changes involved during catalysis or important interactions necessary for retaining functionality. The results show that highly mobile loops that assist in ligand association/dissociation tend to exhibit high correlation. RCA results exhibit good agreement with the findings of experimental and molecular dynamics studies for the three protein families that are analyzed: (i) DHFR (dihydrofolate reductase), (ii) cyclophilin, and (iii) formyl-transferase. Specifically, the specificity (percentage of correct predictions) in all three cases is substantially higher than those obtained by entropic measures or contacting residue pairs. In addition, we use our approach in a predictive fashion to identify important regions of a transmembrane amino acid transporter protein for which there is limited structural and functional information available.

[1]  M. Levitt A simplified representation of protein conformations for rapid simulation of protein folding. , 1976, Journal of molecular biology.

[2]  Stefan M. Larson,et al.  Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. , 2000, Journal of molecular biology.

[3]  I. Wilson,et al.  Towards structure-based drug design: crystal structure of a multisubstrate adduct complex of glycinamide ribonucleotide transformylase at 1.96 A resolution. , 1995, Journal of molecular biology.

[4]  P Argos,et al.  Evolution of protein cores. Constraints in point mutations as observed in globin tertiary structures. , 1990, Journal of molecular biology.

[5]  S. Benkovic,et al.  Active-site mapping and site-specific mutagenesis of glycinamide ribonucleotide transformylase from Escherichia coli. , 1990, Biochemistry.

[6]  F. Arnold,et al.  Laboratory Evolution of Toluene Dioxygenase To Accept 4-Picoline as a Substrate , 2001, Applied and Environmental Microbiology.

[7]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[8]  S A Benner,et al.  Detecting compensatory covariation signals in protein evolution using reconstructed ancestral sequences. , 2002, Journal of molecular biology.

[9]  P. Wright,et al.  Partial 1H NMR assignments of the Escherichia coli dihydrofolate reductase complex with folate: evidence for a unique conformation of bound folate. , 1990, Biochemistry.

[10]  C. Howe,et al.  The role of individual lysine residues in the basic patch on turnip cytochrome f for electrostatic interactions with plastocyanin in vitro. , 2000, European journal of biochemistry.

[11]  C D Maranas,et al.  Modeling DNA mutation and recombination for directed evolution experiments. , 2000, Journal of theoretical biology.

[12]  Y. Zhao,et al.  Crystal structure implies that cyclophilin predominantly catalyzes the trans to cis isomerization. , 1996, Biochemistry.

[13]  M. Sternberg,et al.  Prediction of protein secondary structure and active sites using the alignment of homologous sequences. , 1987, Journal of molecular biology.

[14]  A. Lesk,et al.  How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. , 1980, Journal of molecular biology.

[15]  I. Wilson,et al.  Crystal structure of glycinamide ribonucleotide transformylase from Escherichia coli at 3.0 A resolution. A target enzyme for chemotherapy. , 1992, Journal of molecular biology.

[16]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[17]  J. Thompson,et al.  Using CLUSTAL for multiple sequence alignments. , 1996, Methods in enzymology.

[18]  M. Sternberg,et al.  Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. , 2001, Journal of molecular biology.

[19]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[20]  A. Elcock,et al.  Proton transfer dynamics of GART: The pH‐dependent catalytic mechanism examined by electrostatic calculations , 2001, Protein science : a publication of the Protein Society.

[21]  A. Mclachlan Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551 . , 1971, Journal of molecular biology.

[22]  D. A. Bosco,et al.  Enzyme Dynamics During Catalysis , 2002, Science.

[23]  Frances H. Arnold,et al.  Molecular evolution by staggered extension process (StEP) in vitro recombination , 1998, Nature Biotechnology.

[24]  F. Schmid,et al.  In-vitro selection of highly stabilized protein variants with optimized surface. , 2001, Journal of molecular biology.

[25]  W. Stemmer,et al.  Breeding of retroviruses by DNA shuffling for improved stability and processing yields , 2000, Nature Biotechnology.

[26]  C. Brooks,et al.  Protein Dynamics in Enzymatic Catalysis: Exploration of Dihydrofolate Reductase , 2000 .

[27]  R. Raines,et al.  Coulombic effects of remote subsites on the active site of ribonuclease A. , 1998, Biochemistry.

[28]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[29]  Barbara J. Garrison,et al.  ELECTROSTATIC CHARACTERIZATION OF ENZYME COMPLEXES : EVALUATION OF THE MECHANISM OF CATALYSIS OF DIHYDROFOLATE REDUCTASE , 1997 .

[30]  G. Miller,et al.  Deletion of a highly motional residue affects formation of the Michaelis complex for Escherichia coli dihydrofolate reductase. , 1998, Biochemistry.

[31]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[32]  G. Miller,et al.  Interloop contacts modulate ligand cycling during catalysis by Escherichia coli dihydrofolate reductase. , 2001, Biochemistry.

[33]  J. T. Gerig,et al.  Effects of fluorine substitution on the structure and dynamics of complexes of dihydrofolate reductase (Escherichia coli). , 1997, Biophysical journal.

[34]  R. Casadio,et al.  A neural network based predictor of residue contacts in proteins. , 1999, Protein engineering.

[35]  P E Wright,et al.  Backbone dynamics in dihydrofolate reductase complexes: role of loop flexibility in the catalytic mechanism. , 2001, Biochemistry.

[36]  F. Arnold,et al.  Optimization of DNA shuffling for high fidelity recombination. , 1997, Nucleic acids research.

[37]  B. Matthews,et al.  The role of backbone flexibility in the accommodation of variants that repack the core of T4 lysozyme. , 1994, Science.

[38]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[39]  W. Stemmer Rapid evolution of a protein in vitro by DNA shuffling , 1994, Nature.

[40]  M. Walkinshaw,et al.  The X‐ray structure of a tetrapeptide bound to the active site of human cyclophilin A , 1992, FEBS letters.

[41]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[42]  O. Lund,et al.  Protein distance constraints predicted by neural networks and probability density functions. , 1997, Protein engineering.

[43]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .