Host and infectivity prediction of Wuhan 2019 novel coronavirus using deep learning algorithm

The recent outbreak of pneumonia in Wuhan, China caused by the 2019 Novel Coronavirus (2019-nCoV) emphasizes the importance of detecting novel viruses and predicting their risks of infecting people. In this report, we introduced the VHP (Virus Host Prediction) to predict the potential hosts of viruses using deep learning algorithm. Our prediction suggests that 2019-nCoV has close infectivity with other human coronaviruses, especially the severe acute respiratory syndrome coronavirus (SARS-CoV), Bat SARS-like Coronaviruses and the Middle East respiratory syndrome coronavirus (MERS-CoV). Based on our prediction, compared to the Coronaviruses infecting other vertebrates, bat coronaviruses are assigned with more similar infectivity patterns with 2019-nCoVs. Furthermore, by comparing the infectivity patterns of all viruses hosted on vertebrates, we found mink viruses show a closer infectivity pattern to 2019-nCov. These consequences of infectivity pattern analysis illustrate that bat and mink may be two candidate reservoirs of 2019-nCov.These results warn us to beware of 2019-nCoV and guide us to further explore the properties and reservoir of it. One Sentence Summary It is of great value to identify whether a newly discovered virus has the risk of infecting human. Guo et al. proposed a virus host prediction method based on deep learning to detect what kind of host a virus can infect with DNA sequence as input. Applied to the Wuhan 2019 Novel Coronavirus, our prediction demonstrated that several vertebrate-infectious coronaviruses have strong potential to infect human. This method will be helpful in future viral analysis and early prevention and control of viral pathogens.

[1]  B. Berkhout,et al.  Identification of a new human coronavirus , 2004, Nature Medicine.

[2]  Obi L. Griffith,et al.  The Genome Sequence of the SARS-Associated Coronavirus , 2003, Science.

[3]  Yang Young Lu,et al.  VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data , 2017, Microbiome.

[4]  Daniel H. Huson,et al.  MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[5]  Sudhir Kumar,et al.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms. , 2018, Molecular biology and evolution.

[6]  Morten Nielsen,et al.  HostPhinder: A Phage Host Prediction Tool , 2016, Viruses.

[7]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[8]  Jonathan Vincent,et al.  WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs , 2017, Bioinform..

[9]  A. Osterhaus,et al.  Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. , 2012, The New England journal of medicine.

[10]  Yiming Bao,et al.  NCBI Viral Genomes Resource , 2014, Nucleic Acids Res..

[11]  Dennis Normile,et al.  New SARS-like virus in China triggers alarm. , 2020, Science.

[12]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[13]  Daniel H. Huson,et al.  48. MetaSim: A Sequencing Simulator for Genomics and Metagenomics , 2011 .

[14]  L. Bird,et al.  A bird’s-eye view to the monthly pattern of Middle East Respiratory Syndrome Coronavirus (MERS-CoV) in the world, 2012 until 2016 , 2019, Global Journal of Rare Diseases.

[15]  L. Brammer,et al.  Swine-origin influenza A (H3N2) virus infection in two children--Indiana and Pennsylvania, July-August 2011. , 2011, MMWR. Morbidity and mortality weekly report.

[16]  Andrew J. Page,et al.  Roary: rapid large-scale prokaryote pan genome analysis , 2015, bioRxiv.

[17]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[18]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[19]  Dennis A. Benson,et al.  GenBank , 2007, Nucleic Acids Res..

[20]  Jie Tan,et al.  PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning , 2019, GigaScience.

[21]  J. Peiris,et al.  Epidemiology and cause of severe acute respiratory syndrome (SARS) in Guangdong, People's Republic of China, in February, 2003 , 2003, The Lancet.