Predicting reservoir hosts and arthropod vectors from evolutionary signatures in RNA virus genomes

Predicting hosts and vectors During outbreaks of mysterious infections, events can rapidly become dangerous and confusing. A combination of increasing experience with outbreaks and genome-sequencing technology now means the pathogen can often be identified within days. But for some of the most frightening viral pathogens, the originating hosts and possible vectors often remain obscure. Babayan et al. took sequence data from more than 500 single-stranded RNA viruses (see the Perspective by Woolhouse) and used machine-learning algorithms to extract evolutionary signals imprinted in the virus sequence that offer information about its original hosts and if an arthropod vector, and what type, plays a part in the virus's natural ecology. Science, this issue p. 577; see also p. 524 Machine learning algorithms detect coevolutionary biases in viral genomes that predict hosts. Identifying the animal origins of RNA viruses requires years of field and laboratory studies that stall responses to emerging infectious diseases. Using large genomic and ecological datasets, we demonstrate that animal reservoirs and the existence and identity of arthropod vectors can be predicted directly from viral genome sequences via machine learning. We illustrate the ability of these models to predict the epidemiology of diverse viruses across most human-infective families of single-stranded RNA viruses, including 69 viruses with previously elusive or never-investigated reservoirs or vectors. Models such as these, which capitalize on the proliferation of low-cost genomic sequencing, can narrow the time lag between virus discovery and targeted research, surveillance, and management.

[1]  M. Han,et al.  Cross-Protection against a Human Enteric Coronavirus and a Virulent Bovine Enteric Coronavirus in Gnotobiotic Calves , 2006, Journal of Virology.

[2]  M. Woolhouse,et al.  Ecological Origins of Novel Human Pathogens , 2007, Critical reviews in microbiology.

[3]  Z. Memish,et al.  Middle East Respiratory Syndrome Coronavirus (MERS-CoV) origin and animal reservoir , 2016, Virology Journal.

[4]  A. Rambaut,et al.  MERS-CoV recombination: implications about the reservoir and potential for adaptation , 2015, bioRxiv.

[5]  Yun Zhang,et al.  ViPR: an open bioinformatics database and analysis resource for virology research , 2011, Nucleic Acids Res..

[6]  M. Shi,et al.  Dinucleotide Composition in Animal RNA Viruses Is Shaped More by Virus Family than by Host Species , 2017, Journal of Virology.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  J. R. Coleman,et al.  Virus Attenuation by Genome-Scale Changes in Codon Pair Bias , 2008, Science.

[9]  Nikolaus Osterrieder,et al.  Codon Pair Bias Is a Direct Consequence of Dinucleotide Bias. , 2016, Cell reports.

[10]  Rowland R Kao,et al.  Supersize me: how whole-genome sequencing and big data are transforming epidemiology , 2014, Trends in Microbiology.

[11]  S. Franco,et al.  Synonymous Virus Genome Recoding as a Tool to Impact Viral Fitness. , 2016, Trends in microbiology.

[12]  Yan Li,et al.  Bat Origins of MERS-CoV Supported by Bat Coronavirus HKU4 Usage of Human Receptor CD26 , 2014, Cell Host & Microbe.

[13]  J. Lloyd-Smith,et al.  Assembling evidence for identifying reservoirs of infection , 2014, Trends in Ecology & Evolution.

[14]  P. Bieniasz,et al.  CG-dinucleotide suppression enables antiviral defense targeting non-self RNA , 2017, Nature.

[15]  Amit Kapoor,et al.  Middle East Respiratory Syndrome Coronavirus in Bats, Saudi Arabia , 2013, Emerging infectious diseases.

[16]  T. Sittler,et al.  A Novel Rhabdovirus Associated with Acute Hemorrhagic Fever in Central Africa , 2012, PLoS pathogens.

[17]  M. Heise,et al.  High Rates of O’Nyong Nyong and Chikungunya Virus Transmission in Coastal Kenya , 2015, PLoS neglected tropical diseases.

[18]  David A. Matthews,et al.  Real-time, portable genome sequencing for Ebola surveillance , 2016, Nature.

[19]  J. M. Hutcheon,et al.  A moveable face: deconstructing the Microchiroptera and a new classification of extant bats , 2006 .

[20]  J. R. Lobry,et al.  SeqinR 1.0-2: A Contributed Package to the R Project for Statistical Computing Devoted to Biological Sequences Retrieval and Analysis , 2007 .

[21]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[22]  Y. Diao,et al.  Characterization of a Tembusu virus isolated from naturally infected house sparrows (Passer domesticus) in Northern China. , 2013, Transboundary and emerging diseases.

[23]  P. Simmonds,et al.  Use of Nucleotide Composition Analysis To Infer Hosts for Three Novel Picorna-Like Viruses , 2010, Journal of Virology.

[24]  Parviez R. Hosseini,et al.  Host and viral traits predict zoonotic spillover from mammals , 2017, Nature.

[25]  P. Simmonds,et al.  RNA virus attenuation by codon pair deoptimisation is an artefact of increases in CpG/UpA dinucleotide frequencies , 2014, eLife.

[26]  Andreas Tauch,et al.  Virus-Host Coevolution: Common Patterns of Nucleotide Motif Usage in Flaviviridae and Their Hosts , 2009, PloS one.

[27]  A. Zuur,et al.  Mixed Effects Models and Extensions in Ecology with R , 2009 .

[28]  Gyan Bhanot,et al.  Patterns of Evolution and Host Gene Mimicry in Influenza and Other RNA Viruses , 2008, PLoS pathogens.

[29]  S. El-Kafrawy,et al.  Evidence for camel-to-human transmission of MERS coronavirus. , 2014, The New England journal of medicine.

[30]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[31]  Trevor Bedford,et al.  Virus genomes reveal factors that spread and sustained the Ebola epidemic , 2017, Nature.

[32]  Kevin J. Olival,et al.  Filoviruses in Bats: Current Knowledge and Future Directions , 2014, Viruses.

[33]  Barbara R. Holland,et al.  Analysis of Phylogenetics and Evolution with R , 2007 .

[34]  Brendan J. Frey,et al.  Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets , 2016, Proceedings of the IEEE.

[35]  B. Bosch,et al.  MERS Coronavirus Neutralizing Antibodies in Camels, Eastern Africa, 1983–1997 , 2014, Emerging infectious diseases.

[36]  Luke J. Harmon,et al.  GEIGER: investigating evolutionary radiations , 2008, Bioinform..

[37]  R. Baric,et al.  Receptor usage and cell entry of bat coronavirus HKU4 provide insight into bat-to-human transmission of MERS coronavirus , 2014, Proceedings of the National Academy of Sciences.

[38]  E. Holmes,et al.  Comparative analysis estimates the relative frequencies of co-divergence and cross-species transmission within viral families , 2017, PLoS pathogens.

[39]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.