Identification of a covert evolutionary pathway between two protein folds

Although homologous protein sequences are expected to adopt similar structures, some amino acid substitutions can interconvert α-helices and β-sheets. Such fold switching may have occurred over evolutionary history, but supporting evidence has been limited by the: (1) abundance and diversity of sequenced genes, (2) quantity of experimentally determined protein structures, and (3) assumptions underlying the statistical methods used to infer homology. Here, we overcame these barriers by applying multiple statistical methods to a family of ~600,000 bacterial response regulator proteins. We found that their homologous DNA-binding subunits assume divergent structures: helix-turn-helix versus α-helix+β-sheet (winged helix). Phylogenetic analyses, ancestral sequence reconstruction, and AlphaFold2 models indicated that amino acid substitutions facilitated a switch from helix-turn-helix into winged helix. This structural transformation likely expanded DNA-binding specificity. Our approach uncovers an evolutionary pathway between two protein folds and provides methodology to identify secondary structure switching in other protein families.

[1]  Joseph W. Schafer,et al.  Distinguishing features of fold‐switching proteins , 2023, Protein science : a publication of the Protein Society.

[2]  Yanan He,et al.  Reversible switching between two common protein folds in a designed system using only temperature , 2023, Proceedings of the National Academy of Sciences of the United States of America.

[3]  A. Petrov,et al.  Creative destruction: New protein folds from old , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Eun Jung Choi,et al.  Design and characterization of a protein fold switching network , 2022, bioRxiv.

[5]  George M. Church,et al.  Single-sequence protein structure prediction using a language model and deep learning , 2022, Nature Biotechnology.

[6]  R. Kolodny,et al.  Similar protein segments shared between domains of different evolutionary lineages , 2022, Protein science : a publication of the Protein Society.

[7]  L. Looger,et al.  Many dissimilar NusG protein domains switch between α-helix and β-sheet folds , 2022, Nature Communications.

[8]  S. Ovchinnikov,et al.  ColabFold: making protein folding accessible to all , 2022, Nature Methods.

[9]  Devlina Chakravarty,et al.  AlphaFold2 fails to predict protein fold switching , 2022, bioRxiv.

[10]  R. Lanfear,et al.  Assessing Confidence in Root Placement on Phylogenies: An Empirical Study Using Nonreversible Models for Mammals , 2021, Systematic biology.

[11]  Petar I. Penev,et al.  Fold Evolution before LUCA: Common Ancestry of SH3 Domains and OB Domains , 2021, Molecular biology and evolution.

[12]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[13]  Gyu Rie Lee,et al.  Accurate prediction of protein structures and interactions using a 3-track neural network , 2021, Science.

[14]  L. Looger,et al.  A high‐throughput predictive method for sequence‐similar fold switchers , 2021, Biopolymers.

[15]  M. Babu,et al.  Evolution of fold switching in a metamorphic protein , 2020, Science.

[16]  Dan S. Tawfik,et al.  Bridging Themes: Short Protein Segments Found in Different Architectures , 2020, bioRxiv.

[17]  Scott A. Longwell,et al.  Revealing enzyme functional architecture via high-throughput microfluidic enzyme kinetics , 2020, Science.

[18]  Silvio C. E. Tosatto,et al.  Pfam: The protein families database in 2021 , 2020, Nucleic Acids Res..

[19]  M. Harms,et al.  Ensemble epistasis: thermodynamic origins of nonadditivity between mutations , 2020, bioRxiv.

[20]  R. Best,et al.  Exploring the sequence fitness landscape of a bridge between protein folds , 2020, bioRxiv.

[21]  Hongyan Wu,et al.  A benchmark study of sequence alignment methods for protein clustering , 2018, BMC Bioinformatics.

[22]  M. Cordes,et al.  Multistep mutational transformation of a protein fold through structural intermediates , 2018, Protein science : a publication of the Protein Society.

[23]  B. Volkman,et al.  Unfolding the Mysteries of Protein Metamorphosis , 2018, ACS chemical biology.

[24]  Loren L Looger,et al.  Extant fold-switching proteins are widespread , 2018, Proceedings of the National Academy of Sciences.

[25]  Yang Fu,et al.  The Cancer Mutation D83V Induces an α-Helix to β-Strand Conformation Switch in MEF2B. , 2018, Journal of molecular biology.

[26]  S. Antonyuk,et al.  Architecture of the complete oxygen-sensing FixL-FixJ two-component signal transduction system , 2018, Science Signaling.

[27]  R. Kolodny,et al.  Complex evolutionary footprints revealed in an analysis of reused protein segments of diverse lengths , 2017, Proceedings of the National Academy of Sciences.

[28]  A. von Haeseler,et al.  UFBoot2: Improving the Ultrafast Bootstrap Approximation , 2017, bioRxiv.

[29]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[30]  Thomas K. F. Wong,et al.  ModelFinder: Fast Model Selection for Accurate Phylogenetic Estimates , 2017, Nature Methods.

[31]  Georgios A. Pavlopoulos,et al.  Protein structure determination using metagenome sequence data , 2017, Science.

[32]  Guangchuang Yu,et al.  ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data , 2017 .

[33]  H. Chan,et al.  Theoretical Insights into the Biophysics of Protein Bi-stability and Evolutionary Switches , 2016, PLoS Comput. Biol..

[34]  Itay Mayrose,et al.  ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules , 2016, Nucleic Acids Res..

[35]  J. Söding,et al.  A vocabulary of ancient peptides at the origin of folded proteins , 2015, eLife.

[36]  Liskin Swint-Kruse,et al.  Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores , 2015, Proteins.

[37]  Wen J. Li,et al.  Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation , 2015, Nucleic Acids Res..

[38]  Sheng Li,et al.  A protein fold switch joins the circadian oscillator to clock output in cyanobacteria , 2015, Science.

[39]  Lisa N Kinch,et al.  ChSeq: A database of chameleon sequences , 2015, Protein science : a publication of the Protein Society.

[40]  Robert D. Finn,et al.  HMMER web server: 2015 update , 2015, Nucleic Acids Res..

[41]  Yanan He,et al.  Subdomain interactions foster the design of two protein pairs with ∼80% sequence identity but different folds. , 2015, Biophysical journal.

[42]  Yuxing Liao,et al.  ECOD: An Evolutionary Classification of Protein Domains , 2014, PLoS Comput. Biol..

[43]  H. Chan,et al.  Biophysics of protein evolution and evolutionary protein biophysics , 2014, Journal of The Royal Society Interface.

[44]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[45]  José Arcadio Farías-Rico,et al.  Evolutionary relationship of two ancient protein superfolds. , 2014, Nature chemical biology.

[46]  Michael J. Harms,et al.  Historical contingency and its biophysical basis in glucocorticoid receptor evolution , 2014, Nature.

[47]  Rheostats and Toggle Switches for Modulating Protein Function , 2013, PloS one.

[48]  Ann M Stock,et al.  Phosphorylation-dependent conformational changes and domain rearrangements in Staphylococcus aureus VraR activation , 2013, Proceedings of the National Academy of Sciences.

[49]  K. Vousden,et al.  p53 mutations in cancer , 2013, Nature Cell Biology.

[50]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[51]  Erich Bornberg-Bauer,et al.  Escape from Adaptive Conflict follows from weak functional trade-offs and mutational robustness , 2012, Proceedings of the National Academy of Sciences.

[52]  Lucy J. Colwell,et al.  The interface of protein structure, protein biophysics, and molecular evolution , 2012, Protein science : a publication of the Protein Society.

[53]  John Orban,et al.  Mutational tipping points for switching protein folds and functions. , 2012, Structure.

[54]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[55]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[56]  Cheryl A. Kerfeld,et al.  Using BLAST to Teach “E-value-tionary” Concepts , 2011, PLoS biology.

[57]  Dan S. Tawfik,et al.  Metamorphic proteins mediate evolutionary transitions of structure , 2010, Proceedings of the National Academy of Sciences.

[58]  Michael Y. Galperin,et al.  Diversity of structure and function of response regulator output domains. , 2010, Current opinion in microbiology.

[59]  Paramvir S. Dehal,et al.  FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments , 2010, PloS one.

[60]  W. Pearson,et al.  Homologous over-extension: a challenge for iterative similarity searches , 2010, Nucleic acids research.

[61]  P. Alexander,et al.  A minimal sequence code for switching protein structure and function , 2009, Proceedings of the National Academy of Sciences.

[62]  Sean R Eddy,et al.  A new generation of homology search tools based on probabilistic inference. , 2009, Genome informatics. International Conference on Genome Informatics.

[63]  Adam P. Arkin,et al.  FastTree: Computing Large Minimum Evolution Trees with Profiles instead of a Distance Matrix , 2009, Molecular biology and evolution.

[64]  Bartek Wilczynski,et al.  Biopython: freely available Python tools for computational molecular biology and bioinformatics , 2009, Bioinform..

[65]  B. Volkman,et al.  Interconversion between two unrelated protein folds in the lymphotactin native state , 2008, Proceedings of the National Academy of Sciences.

[66]  Matthew H J Cordes,et al.  Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds , 2008, Proceedings of the National Academy of Sciences.

[67]  Tim J. P. Hubbard,et al.  Data growth and its impact on the SCOP database: new developments , 2007, Nucleic Acids Res..

[68]  John Orban,et al.  The design and characterization of two proteins with 88% sequence identity but different structure and function , 2007, Proceedings of the National Academy of Sciences.

[69]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[70]  Ann M Stock,et al.  Bacterial response regulators: versatile regulatory strategies from common domains. , 2007, Trends in biochemical sciences.

[71]  Nick V Grishin,et al.  Structural basis for converting a general transcription factor into an operon-specific virulence regulator. , 2007, Molecular cell.

[72]  Jimin Pei,et al.  PROMALS: towards accurate multiple sequence alignments of distantly related proteins , 2007, Bioinform..

[73]  Frances M. G. Pearl,et al.  The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution , 2006, Nucleic Acids Res..

[74]  Liskin Swint-Kruse,et al.  Resmap: automated representation of macromolecular interfaces as two-dimensional networks , 2005, Bioinform..

[75]  Frances M. G. Pearl,et al.  The CATH domain structure database. , 2005, Methods of biochemical analysis.

[76]  L. Aravind,et al.  The many faces of the helix-turn-helix domain: transcription regulation and beyond. , 2005, FEMS microbiology reviews.

[77]  M. Matz,et al.  Evolution of Coral Pigments Recreated , 2004, Science.

[78]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[79]  Jay H. Konieczka,et al.  Secondary structure switching in Cro protein evolution. , 2004, Structure.

[80]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[81]  Hidetoshi Shimodaira An approximately unbiased test of phylogenetic tree selection. , 2002, Systematic biology.

[82]  K. Strimmer,et al.  Inferring confidence sets of possibly misspecified gene trees , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[83]  Alexander Schliep,et al.  Clustering Protein Sequences ? Structure Prediction by Transitive Homology , 2001, German Conference on Bioinformatics.

[84]  N. Grishin Fold change in evolution of protein structures. , 2001, Journal of structural biology.

[85]  James R. Brown,et al.  Evolution of two-component signal transduction. , 2000, Molecular biology and evolution.

[86]  Hidetoshi Shimodaira,et al.  Multiple Comparisons of Log-Likelihoods with Applications to Phylogenetic Inference , 1999, Molecular Biology and Evolution.

[87]  R. Sauer,et al.  Evolution of a protein fold in vitro. , 1999, Science.

[88]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[89]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[90]  M. Rice,et al.  Correction of the Mutation Responsible for Sickle Cell Anemia by an RNA-DNA Oligonucleotide , 1996, Science.

[91]  P. S. Kim,et al.  Context-dependent secondary structure formation of a designed protein sequence , 1996, Nature.

[92]  S W Englander,et al.  Future directions in folding: The multi‐state nature of protein structure , 1996, Proteins.

[93]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[94]  X. Estivill,et al.  The origin of the major cystic fibrosis mutation (ΔF508) in European populations , 1994, Nature Genetics.

[95]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[96]  A. Fersht,et al.  Folding of chymotrypsin inhibitor 2. 1. Evidence for a two-state transition. , 1991, Biochemistry.

[97]  H. Kishino,et al.  Maximum likelihood inference of protein phylogeny and the origin of chloroplasts , 1990, Journal of Molecular Evolution.

[98]  H. Kishino,et al.  Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea , 1989, Journal of Molecular Evolution.

[99]  C. Schutt,et al.  Three-dimensional structure of CheY, the response regulator of bacterial chemotaxis , 1989, Nature.

[100]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[101]  C. Anfinsen Principles that govern the folding of protein chains. , 1973, Science.

[102]  Michael L. Waskom,et al.  Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[103]  OUP accepted manuscript , 2021, Nucleic Acids Research.

[104]  Haruki Nakamura,et al.  Protein Data Bank (PDB): The Single Global Macromolecular Structure Archive. , 2017, Methods in molecular biology.

[105]  R. Durbin,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[106]  Mark Gerstein,et al.  Measurement of the effectiveness of transitive sequence comparison, through a third 'intermediate' sequence , 1998, Bioinform..

[107]  X. Estivill,et al.  The origin of the major cystic fibrosis mutation (delta F508) in European populations. , 1994, Nature genetics.

[108]  H. Berman,et al.  The Protein Data Bank. , 2002, Acta crystallographica. Section D, Biological crystallography.