Applying neural networks to classify influenza virus antigenic types and hosts

Influenza viruses continue to evolve rapidly and are responsible for seasonal epidemics and occasional, but catastrophic, pandemics. We recently demonstrated the use of decision tree and support vector machine methods in classifying pandemic swine flu viral strains with high accuracy. Here, we applied the technique of artificial neural networks for the prediction of important influenza virus antigenic types (H1, H3, and H5) and hosts (Human, Avian, and Swine), which fulfills a critical need for a computational system for influenza surveillance. A comprehensive experiment on different k-mers and different binary encoding types showed classification based upon frequencies of k-mer nucleotide strings performed better than transformed binary data of nucleotides. It has been found for the first time that the accuracy of virus classification varies from host to host and from gene segment to gene segment. In particular, compared to avian and swine viruses, human influenza viruses can be classified with high accuracy, which indicates influenza virus strains might have become well adapted to their human host and hence less variation occurs in human viruses. In addition, the accuracy of host classification varies from genome segment to segment, achieving the highest values when using the HA and NA segments for human host classification. This research, along with our previous studies, shows machine learning techniques play an indispensable role in virus classification.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  A. Fauci,et al.  The challenge of emerging and re-emerging infectious diseases , 2004, Nature.

[3]  H. Pereira,et al.  Swine influenza , 1976, Nature.

[4]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[5]  Yoshihiro Kawaoka,et al.  Influenza Virology: Current Topics , 2006 .

[6]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[7]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[8]  C. Wu,et al.  Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. , 1994, Nucleic acids research.

[9]  Zheng Rong Yang,et al.  Characterizing proteolytic cleavage site activity using bio-basis function neural networks , 2003, Bioinform..

[10]  Lance C. Jennings Influenza virology: current topics , 2007 .

[11]  Zhengxin Chen,et al.  Applying machine learning techniques to classify H 1 N 1 viral strains occurring in 2009 flu pandemic , 2009 .

[12]  Dong Xu,et al.  Phylogenetic analysis using complete signature information of whole genomes and clustered Neighbour-Joining method , 2006, Int. J. Bioinform. Res. Appl..

[13]  Y Li,et al.  The evolution of H5N1 influenza viruses in ducks in southern China. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Jeff Heaton,et al.  Introduction to Neural Networks for C#, 2nd Edition , 2008 .

[15]  David J. Lipman,et al.  A global initiative on sharing avian flu data , 2006, Nature.

[16]  Xiang Fang,et al.  An improved string composition method for sequence comparison , 2008, BMC Bioinformatics.

[17]  Christie Johnson,et al.  Influenza A viruses in feral Canadian ducks: extensive reassortment in nature. , 2004, The Journal of general virology.

[18]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[19]  Etsuko N. Moriyama,et al.  GenomeBlast: A Web Tool for Small Genome Comparison , 2006, IMSCCS.

[20]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[21]  Zhengxin Chen,et al.  Influenza a virus informatics: genotype-centered database and genotype annotation , 2007, Second International Multi-Symposiums on Computer and Computational Sciences (IMSCCS 2007).

[22]  S. Knudsen,et al.  Prediction of human mRNA donor and acceptor sites from the DNA sequence. , 1991, Journal of molecular biology.

[23]  Zhengxin Chen,et al.  Integrating Decision Tree and Hidden Markov Model (HMM) for Subtype Prediction of Human Influenza A Virus , 2009 .

[24]  A. Douglas,et al.  The evolution of human influenza viruses. , 2001, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[25]  Jun Xu,et al.  Novel Efficient Check Node Update Implementations for Row Weight Matched Min-Sum Algorithm , 2007 .

[26]  Cathy H. Wu Artificial Neural Networks for Molecular Sequence Analysis , 1997, Comput. Chem..

[27]  Kevin N. Gurney,et al.  An introduction to neural networks , 2018 .

[28]  S Brunak,et al.  Analysis of eukaryotic promoter sequences reveals a systematically occurring CT-signal. , 1995, Nucleic acids research.

[29]  Iain Stephenson,et al.  Influenza: current threat from avian influenza. , 2005, British medical bulletin.

[30]  Gabriele Neumann,et al.  Emergence and pandemic potential of swine-origin H1N1 influenza virus , 2009, Nature.

[31]  Yi Peng,et al.  Cutting-Edge Research Topics on Multiple Criteria Decision Making , 2009 .