Sequence and structure continuity of evolutionary importance improves protein functional site discovery and annotation

Protein functional sites control most biological processes and are important targets for drug design and protein engineering. To characterize them, the evolutionary trace (ET) ranks the relative importance of residues according to their evolutionary variations. Generally, top‐ranked residues cluster spatially to define evolutionary hotspots that predict functional sites in structures. Here, various functions that measure the physical continuity of ET ranks among neighboring residues in the structure, or in the sequence, are shown to inform sequence selection and to improve functional site resolution. This is shown first, in 110 proteins, for which the overlap between top‐ranked residues and actual functional sites rose by 8% in significance. Then, on a structural proteomic scale, optimized ET led to better 3D structure‐function motifs (3D templates) and, in turn, to enzyme function prediction by the Evolutionary Trace Annotation (ETA) method with better sensitivity of (40% to 53%) and positive predictive value (93% to 94%). This suggests that the similarity of evolutionary importance among neighboring residues in the sequence and in the structure is a universal feature of protein evolution. In practice, this yields a tool for optimizing sequence selections for comparative analysis and, via ET, for better predictions of functional site and function. This should prove useful for the efficient mutational redesign of protein function and for pharmaceutical targeting.

[1]  Eric A. Althoff,et al.  De Novo Computational Design of Retro-Aldol Enzymes , 2008, Science.

[2]  L. Kavraki,et al.  An accurate, sensitive, and scalable method to identify functional sites in protein structures. , 2003, Journal of molecular biology.

[3]  J. Thornton,et al.  A method for localizing ligand binding pockets in protein structures , 2005, Proteins.

[4]  Angelo D. Favia,et al.  Protein promiscuity and its implications for biotechnology , 2009, Nature Biotechnology.

[5]  O. Lichtarge,et al.  Combining inference from evolution and geometric probability in protein structure evaluation. , 2003, Journal of molecular biology.

[6]  C. Axel Innis,et al.  siteFiNDER|3D: a web-based tool for predicting the location of functional sites in proteins , 2007, Nucleic Acids Res..

[7]  O. Lichtarge,et al.  Structural clusters of evolutionary trace residues are statistically significant and common in proteins. , 2002, Journal of molecular biology.

[8]  Andreas Martin Lisewski,et al.  De-Orphaning the Structural Proteome through Reciprocal Comparison of Evolutionarily Important Structural Features , 2008, PloS one.

[9]  Olivier Lichtarge,et al.  Essential Helix Interactions in the Anion Transporter Domain of Prestin Revealed by Evolutionary Trace Analysis , 2006, The Journal of Neuroscience.

[10]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[11]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[12]  M. Ranke,et al.  5 Recombinant human growth hormone , 1986 .

[13]  Najeeb M. Halabi,et al.  Protein Sectors: Evolutionary Units of Three-Dimensional Structure , 2009, Cell.

[14]  J. Skolnick,et al.  How well is enzyme function conserved as a function of pairwise sequence identity? , 2003, Journal of molecular biology.

[15]  A. Wlodawer,et al.  Structure of phosphate-free ribonuclease A refined at 1.26 A. , 1988, Biochemistry.

[16]  J. Wells,et al.  Searching for new allosteric sites in enzymes. , 2004, Current opinion in structural biology.

[17]  Olivier Lichtarge,et al.  Prediction and confirmation of a site critical for effector regulation of RGS domain activity , 2001, Nature Structural Biology.

[18]  T. Clackson,et al.  A hot spot of binding energy in a hormone-receptor interface , 1995, Science.

[19]  Janet M. Thornton,et al.  PDBsum more: new summaries and analyses of the known 3D structures of proteins and nucleic acids , 2004, Nucleic Acids Res..

[20]  B. Matthews,et al.  Control of enzyme activity by an engineered disulfide bond. , 1994, Science.

[21]  D. Cantrell,et al.  Intracellular location and cell context-dependent function of protein kinase D. , 2003, Immunity.

[22]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[23]  Olivier Lichtarge,et al.  beta-arrestin-dependent, G protein-independent ERK1/2 activation by the beta2 adrenergic receptor. , 2006, The Journal of biological chemistry.

[24]  Lydia E. Kavraki,et al.  Prediction of enzyme function based on 3D templates of evolutionarily important amino acids , 2008, BMC Bioinformatics.

[25]  P. Kelly,et al.  Rational Design of Competitive Prolactin/Growth Hormone Receptor Antagonists , 2008, Journal of Mammary Gland Biology and Neoplasia.

[26]  J. Adkins,et al.  Transcriptional activity of the TFIIA four‐helix bundle in vivo , 2001, Proteins.

[27]  O. Lichtarge,et al.  A regulator of G protein signaling interaction surface linked to effector specificity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Summer B. Thyme,et al.  Exploitation of binding energy for catalysis and design , 2009, Nature.

[29]  Olivier Lichtarge,et al.  Rank information: A structure‐independent measure of evolutionary trace quality that improves identification of protein functional sites , 2006, Proteins.

[30]  Patricia C. Babbitt,et al.  Automated discovery of 3D motifs for protein function annotation , 2006, Bioinform..

[31]  Tal Pupko,et al.  Structural Genomics , 2005 .

[32]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[33]  O. Lichtarge,et al.  Functional Rescue of β1‐Adrenoceptor Dimerization and Trafficking by Pharmacological Chaperones , 2009, Traffic.

[34]  R. Nussinov,et al.  Hot regions in protein--protein interactions: the organization and contribution of structurally conserved hot spot residues. , 2005, Journal of molecular biology.

[35]  F E Cohen,et al.  Evolutionarily conserved Galphabetagamma binding surfaces support a model of the G protein-receptor complex. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[36]  M. Ondrechen,et al.  THEMATICS: A simple computational predictor of enzyme function from structure , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Olivier Lichtarge,et al.  Evolutionary Trace Annotation Server: automated enzyme function prediction in protein structures using 3D templates , 2009, Bioinform..

[38]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[39]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[40]  O. Lichtarge,et al.  Evolutionary and structural feedback on selection of sequences for comparative analysis of proteins , 2006, Proteins.

[41]  B. Erman,et al.  Information‐theoretical entropy as a measure of sequence variability , 1991, Proteins.

[42]  S. Erdin,et al.  Evolutionary trace annotation of protein function in the structural proteome. , 2010, Journal of molecular biology.

[43]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[44]  M. Ultsch,et al.  Human growth hormone and extracellular domain of its receptor: crystal structure of the complex. , 1992, Science.

[45]  Olivier Lichtarge,et al.  β-Arrestin-dependent, G Protein-independent ERK1/2 Activation by the β2 Adrenergic Receptor* , 2006, Journal of Biological Chemistry.

[46]  K. Nishikawa,et al.  Prediction of catalytic residues in enzymes based on known tertiary structure, stability profile, and sequence conservation. , 2003, Journal of molecular biology.

[47]  Kimmen Sjölander,et al.  INTREPID—INformation-theoretic TREe traversal for Protein functional site IDentification , 2008, Bioinform..

[48]  O. Lichtarge,et al.  Similar structures and shared switch mechanisms of the beta2-adrenoceptor and the parathyroid hormone receptor. Zn(II) bridges between helices III and VI block activation. , 1999, The Journal of biological chemistry.

[49]  B. Rost Enzyme function less conserved than anticipated. , 2002, Journal of molecular biology.

[50]  A. Elcock Prediction of functionally important residues based solely on the computed energetics of protein structure. , 2001, Journal of molecular biology.

[51]  O. Lichtarge,et al.  Receptor and betagamma binding sites in the alpha subunit of the retinal G protein transducin. , 1997, Science.

[52]  Janet M. Thornton,et al.  Understanding the molecular machinery of genetics through 3D structures , 2008, Nature Reviews Genetics.

[53]  S. Smerdon,et al.  Structure of the TPR domain of p67phox in complex with Rac.GTP. , 2000, Molecular cell.

[54]  Gail J. Bartlett,et al.  Using a neural network and spatial clustering to predict the location of active sites in enzymes. , 2003, Journal of molecular biology.

[55]  O. Lichtarge,et al.  Rhodopsin activation blocked by metal-ion-binding sites linking transmembrane helices C and F , 1996, Nature.

[56]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[57]  Bassem A. Hassan,et al.  Evolution of neural precursor selection: functional divergence of proneural proteins , 2004, Development.

[58]  A. Lazzarin,et al.  Recombinant human growth hormone: rationale for use in the treatment of HIV-associated lipodystrophy. , 2008, BioDrugs.

[59]  O. Lichtarge,et al.  Evolutionary Trace of G Protein-coupled Receptors Reveals Clusters of Residues That Determine Global and Class-specific Functions* , 2004, Journal of Biological Chemistry.

[60]  Janet M Thornton,et al.  Using electrostatic potentials to predict DNA-binding sites on DNA-binding proteins. , 2003, Nucleic acids research.

[61]  Olivier Lichtarge,et al.  Evolution-guided discovery and recoding of allosteric pathway specificity determinants in psychoactive bioamine receptors , 2010, Proceedings of the National Academy of Sciences.

[62]  David A. Lee,et al.  Predicting protein function from sequence and structure , 2007, Nature Reviews Molecular Cell Biology.

[63]  A. del Sol,et al.  Small‐world network approach to identify key residues in protein–protein interaction , 2004, Proteins.

[64]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[65]  John Orban,et al.  NMR structures of two designed proteins with high sequence identity but different fold and function , 2008, Proceedings of the National Academy of Sciences.

[66]  Olivier Lichtarge,et al.  ET viewer: an application for predicting and visualizing functional sites in protein structures , 2006, Bioinform..

[67]  Olivier Lichtarge,et al.  A structure and evolution-guided Monte Carlo sequence selection strategy for multiple alignment-based analysis of proteins , 2006, Bioinform..

[68]  Olivier Lichtarge,et al.  Correlated evolutionary pressure at interacting transcription factors and DNA response elements can guide the rational engineering of DNA binding specificity. , 2005, Journal of molecular biology.

[69]  Olivier Lichtarge,et al.  Receptor and βγ Binding Sites in the α Subunit of the Retinal G Protein Transducin , 1997, Science.

[70]  Alessandra Carbone,et al.  Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling , 2009, PLoS Comput. Biol..

[71]  Olivier Lichtarge,et al.  Distinct faces of the Ku heterodimer mediate DNA repair and telomeric functions , 2007, Nature Structural &Molecular Biology.

[72]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[73]  O. Lichtarge,et al.  Role for the Regulator of G-Protein Signaling Homology Domain of G Protein-Coupled Receptor Kinases 5 and 6 in β2-Adrenergic Receptor and Rhodopsin Phosphorylation , 2010, Molecular Pharmacology.

[74]  A. M. Lisewski,et al.  Rapid detection of similarity in protein structure and function through contact metric distances , 2006, Nucleic acids research.

[75]  Nir Ben-Tal,et al.  Detection of functionally important regions in "hypothetical proteins" of known structure. , 2008, Structure.