Learning the language of viral evolution and escape

Viral mutation that escapes from human immunity remains a major obstacle to antiviral and vaccine development. While anticipating escape could aid rational therapeutic design, the complex rules governing viral escape are challenging to model. Here, we demonstrate an unprecedented ability to predict viral escape by using machine learning algorithms originally developed to model the complexity of human natural language. Our key conceptual advance is that predicting escape requires identifying mutations that preserve viral fitness, or “grammaticality,” and also induce high antigenic change, or “semantic change.” We develop viral language models for influenza hemagglutinin, HIV Env, and SARS-CoV-2 Spike that we use to construct antigenically meaningful semantic landscapes, perform completely unsupervised prediction of escape mutants, and learn structural escape patterns from sequence alone. More profoundly, we lay a promising conceptual bridge between natural language and viral evolution. One sentence summary Neural language models of semantic change and grammaticality enable unprecedented prediction of viral escape mutations.

[1]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[2]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[3]  Michael S. Kay,et al.  Protein Design of an HIV-1 Entry Inhibitor , 2001, Science.

[4]  P S Kim,et al.  Mechanisms of viral membrane fusion and its inhibition. , 2001, Annual review of biochemistry.

[5]  P S Kim,et al.  Mechanisms of viral membrane fusion and its inhibition. , 2001, Annual review of biochemistry.

[6]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[7]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[8]  Martin A. Nowak,et al.  Antibody neutralization and escape by HIV-1 , 2003, Nature.

[9]  Martin A. Nowak,et al.  Antibody neutralization and escape by HIV-1 , 2003, Nature.

[10]  D. Richman,et al.  Rapid evolution of the neutralizing antibody response to HIV type 1 infection , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  D. Richman,et al.  Rapid evolution of the neutralizing antibody response to HIV type 1 infection , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  D. J. Stevens,et al.  The Structure and Receptor Binding Properties of the 1918 Influenza Hemagglutinin , 2004, Science.

[13]  D. J. Stevens,et al.  The Structure and Receptor Binding Properties of the 1918 Influenza Hemagglutinin , 2004, Science.

[14]  J. Overbaugh,et al.  Human Immunodeficiency Virus Type 1 V1-V2 Envelope Loop Sequences Expand and Add Glycosylation Sites over the Course of Infection, and These Modifications Affect Antibody Neutralization Sensitivity , 2006, Journal of Virology.

[15]  J. Overbaugh,et al.  Human Immunodeficiency Virus Type 1 V1-V2 Envelope Loop Sequences Expand and Add Glycosylation Sites over the Course of Infection, and These Modifications Affect Antibody Neutralization Sensitivity , 2006, Journal of Virology.

[16]  B. Eaton,et al.  Bats, Civets and the Emergence of SARS , 2007, Current topics in microbiology and immunology.

[17]  B. Eaton,et al.  Bats, Civets and the Emergence of SARS , 2007, Current topics in microbiology and immunology.

[18]  Sean R. Eddy,et al.  A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation , 2008, PLoS Comput. Biol..

[19]  Sean R. Eddy,et al.  A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation , 2008, PLoS Comput. Biol..

[20]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[21]  Gira Bhabha,et al.  Antibody Recognition of a Highly Conserved Influenza Virus Epitope , 2009, Science.

[22]  Gira Bhabha,et al.  Antibody Recognition of a Highly Conserved Influenza Virus Epitope , 2009, Science.

[23]  Chih-Jen Wei,et al.  Cross-Neutralization of 1918 and 2009 Influenza Viruses: Role of Glycans in Viral Evolution and Vaccine Design , 2010, Science Translational Medicine.

[24]  Chih-Jen Wei,et al.  Cross-Neutralization of 1918 and 2009 Influenza Viruses: Role of Glycans in Viral Evolution and Vaccine Design , 2010, Science Translational Medicine.

[25]  R. Sanjuán,et al.  Viral Mutation Rates , 2010, Journal of Virology.

[26]  R. Sanjuán,et al.  Viral Mutation Rates , 2010, Journal of Virology.

[27]  James E. Crowe,et al.  Structural Basis of Preexisting Immunity to the 2009 H1N1 Pandemic Influenza Virus , 2010, Science.

[28]  Tomer Hertz,et al.  Putative amino acid determinants of the emergence of the 2009 influenza A (H1N1) virus in the human population , 2011, Proceedings of the National Academy of Sciences.

[29]  Tomer Hertz,et al.  Putative amino acid determinants of the emergence of the 2009 influenza A (H1N1) virus in the human population , 2011, Proceedings of the National Academy of Sciences.

[30]  R. Swanstrom,et al.  The HIV-1 Env Protein: A Coat of Many Colors , 2012, Current HIV/AIDS Reports.

[31]  R. Swanstrom,et al.  The HIV-1 Env Protein: A Coat of Many Colors , 2012, Current HIV/AIDS Reports.

[32]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[33]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[34]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[36]  Michael J Sailor,et al.  Mesoporous silicon sponge as an anti-pulverization structure for high-performance lithium-ion battery anodes , 2014, Nature Communications.

[37]  Y. Guan,et al.  MERS Coronaviruses in Dromedary Camels, Egypt , 2014, Emerging infectious diseases.

[38]  Y. Guan,et al.  MERS Coronaviruses in Dromedary Camels, Egypt , 2014, Emerging infectious diseases.

[39]  Y. Iba,et al.  Receptor mimicry by antibody F045–092 facilitates universal binding to the H3 subtype of influenza virus , 2014, Nature Communications.

[40]  Y. Iba,et al.  Receptor mimicry by antibody F045–092 facilitates universal binding to the H3 subtype of influenza virus , 2014, Nature Communications.

[41]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[42]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[43]  D. Cummings,et al.  Estimating the Life Course of Influenza A(H3N2) Antibody Responses from Cross-Sectional Data , 2015, PLoS biology.

[44]  D. Cummings,et al.  Estimating the Life Course of Influenza A(H3N2) Antibody Responses from Cross-Sectional Data , 2015, PLoS biology.

[45]  Debora S. Marks,et al.  Quantification of the effect of mutations using a global probability model of natural sequence variation , 2015, 1510.04612.

[46]  Michael B. Doud,et al.  Accurate Measurement of the Effects of All Amino-Acid Mutations on Influenza Hemagglutinin , 2016, Viruses.

[47]  Michael B. Doud,et al.  Accurate Measurement of the Effects of All Amino-Acid Mutations on Influenza Hemagglutinin , 2016, Viruses.

[48]  Young Do Kwon,et al.  Trimeric HIV-1-Env Structures Define Glycan Shields from Clades A, B, and G , 2016, Cell.

[49]  P. Collins,et al.  Structure and Function Analysis of an Antibody Recognizing All Influenza A Subtypes , 2016, Cell.

[50]  Thomas A. Hopf,et al.  Mutation effects predicted from sequence co-variation , 2017, Nature Biotechnology.

[51]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[52]  R. Webster,et al.  Influenza Virus: Dealing with a Drifting and Shifting Pathogen. , 2018, Viral immunology.

[53]  R. Webster,et al.  Influenza Virus: Dealing with a Drifting and Shifting Pathogen. , 2018, Viral immunology.

[54]  Galit Alter,et al.  Evaluation of a mosaic HIV-1 vaccine in a multicentre, randomised, double-blind, placebo-controlled, phase 1/2a clinical trial (APPROACH) and in rhesus monkeys (NHP 13-19) , 2018, The Lancet.

[55]  Michael B. Doud,et al.  How single mutations affect viral escape from broad and narrow antibodies to H1 influenza hemagglutinin , 2018, Nature Communications.

[56]  Michael B. Doud,et al.  How single mutations affect viral escape from broad and narrow antibodies to H1 influenza hemagglutinin , 2018, Nature Communications.

[57]  Mapping mutational effects along the evolutionary landscape of HIV envelope , 2018, eLife.

[58]  Mapping mutational effects along the evolutionary landscape of HIV envelope , 2018, eLife.

[59]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[60]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[61]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[62]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[63]  Matthew R. McKay,et al.  Fitness landscape of the human immunodeficiency virus envelope protein that is targeted by antibodies , 2018, Proceedings of the National Academy of Sciences.

[64]  Dariusz M Plewczynski,et al.  Three-dimensional Epigenome Statistical Model: Genome-wide Chromatin Looping Prediction , 2018, Scientific Reports.

[65]  P. Wilson,et al.  The influenza virus hemagglutinin head evolves faster than the stalk domain , 2018, Scientific Reports.

[66]  Andrew Rambaut,et al.  HIV Sequence Compendium 2018 , 2018 .

[67]  F. Krammer The human antibody response to influenza A virus infection and vaccination , 2019, Nature Reviews Immunology.

[68]  F. Krammer The human antibody response to influenza A virus infection and vaccination , 2019, Nature Reviews Immunology.

[69]  Bonnie Berger,et al.  Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape. , 2019, Cell systems.

[70]  Bonnie Berger,et al.  Learning protein sequence embeddings using information from structure , 2019, ICLR.

[71]  Niema Moshiri,et al.  TreeCluster: clustering biological sequences using phylogenetic trees , 2019 .

[72]  George M. Church,et al.  Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.

[73]  George M. Church,et al.  Unified rational protein engineering with sequence-based deep representation learning , 2019, Nature Methods.

[74]  Niema Moshiri,et al.  TreeCluster: Clustering biological sequences using phylogenetic trees , 2019, bioRxiv.

[75]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[76]  J. Bloom,et al.  Mapping person-to-person variation in viral mutations that escape polyclonal serum targeting influenza hemagglutinin , 2019, bioRxiv.

[77]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[78]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[79]  J. Bloom,et al.  Mapping person-to-person variation in viral mutations that escape polyclonal serum targeting influenza hemagglutinin , 2019, bioRxiv.

[80]  J. Bloom,et al.  Mapping person-to-person variation in viral mutations that escape polyclonal serum targeting influenza hemagglutinin , 2019, bioRxiv.

[81]  Robert P. Sheridan,et al.  The EVcouplings Python framework for coevolutionary sequence analysis , 2018, bioRxiv.

[82]  Robert P. Sheridan,et al.  The EVcouplings Python framework for coevolutionary sequence analysis , 2018, bioRxiv.

[83]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[84]  John Canny,et al.  Evaluating Protein Transfer Learning with TAPE , 2019, bioRxiv.

[85]  J. Bloom,et al.  An Antigenic Atlas of HIV‐1 Escape from Broadly Neutralizing Antibodies Distinguishes Functional and Structural Epitopes , 2019, Immunity.

[86]  J. Bloom,et al.  An Antigenic Atlas of HIV‐1 Escape from Broadly Neutralizing Antibodies Distinguishes Functional and Structural Epitopes , 2019, Immunity.

[87]  I. Wilson,et al.  Major antigenic site B of human influenza H3N2 viruses has an evolving local fitness landscape , 2020, Nature Communications.

[88]  Jesse D. Bloom,et al.  Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding , 2020, bioRxiv.

[89]  Jesse D. Bloom,et al.  Deep mutational scanning of SARS-CoV-2 receptor binding domain reveals constraints on folding and ACE2 binding , 2020, bioRxiv.

[90]  P. Carmeliet,et al.  PHD1 controls muscle mTORC1 in a hydroxylation-independent manner by stabilizing leucyl tRNA synthetase , 2020, Nature Communications.

[91]  A. Walls,et al.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein , 2020, Cell.

[92]  A. Walls,et al.  Structure, Function, and Antigenicity of the SARS-CoV-2 Spike Glycoprotein , 2020, Cell.

[93]  D. Baker,et al.  Elicitation of broadly protective immunity to influenza by multivalent hemagglutinin nanoparticle vaccines , 2020, bioRxiv.

[94]  D. Baker,et al.  Elicitation of broadly protective immunity to influenza by multivalent hemagglutinin nanoparticle vaccines , 2020, bioRxiv.

[95]  G. Atwal,et al.  Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies , 2020, Science.

[96]  G. Atwal,et al.  Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies , 2020, Science.

[97]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[98]  E. Holmes,et al.  The proximal origin of SARS-CoV-2 , 2020, Nature Medicine.

[99]  S. Khurana,et al.  Antibody signature induced by SARS-CoV-2 spike protein immunogens in rabbits , 2020, Science Translational Medicine.

[100]  S. Khurana,et al.  Antibody signature induced by SARS-CoV-2 spike protein immunogens in rabbits , 2020, Science Translational Medicine.