Predicting host tropism of influenza A virus proteins using random forest

BackgroundMajority of influenza A viruses reside and circulate among animal populations, seldom infecting humans due to host range restriction. Yet when some avian strains do acquire the ability to overcome species barrier, they might become adapted to humans, replicating efficiently and causing diseases, leading to potential pandemic. With the huge influenza A virus reservoir in wild birds, it is a cause for concern when a new influenza strain emerges with the ability to cross host species barrier, as shown in light of the recent H7N9 outbreak in China. Several influenza proteins have been shown to be major determinants in host tropism. Further understanding and determining host tropism would be important in identifying zoonotic influenza virus strains capable of crossing species barrier and infecting humans.ResultsIn this study, computational models for 11 influenza proteins have been constructed using the machine learning algorithm random forest for prediction of host tropism. The prediction models were trained on influenza protein sequences isolated from both avian and human samples, which were transformed into amino acid physicochemical properties feature vectors. The results were highly accurate prediction models (ACC>96.57; AUC>0.980; MCC>0.916) capable of determining host tropism of individual influenza proteins. In addition, features from all 11 proteins were used to construct a combined model to predict host tropism of influenza virus strains. This would help assess a novel influenza strain's host range capability.ConclusionsFrom the prediction models constructed, all achieved high prediction performance, indicating clear distinctions in both avian and human proteins. When used together as a host tropism prediction system, zoonotic strains could potentially be identified based on different protein prediction results. Understanding and predicting host tropism of influenza proteins lay an important foundation for future work in constructing computation models capable of directly predicting interspecies transmission of influenza viruses. The models are available for prediction at http://fluleap.bic.nus.edu.sg.

[1]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[2]  C. Naeve,et al.  Antigenic analyses of influenza virus haemagglutinins with different receptor-binding specificities. , 1984, Virology.

[3]  I. Muchnik,et al.  Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification. , 1999, Proteins.

[4]  Isolation of avian influenza A(H5N1) viruses from humans--Hong Kong, May-December 1997. , 1997, MMWR. Morbidity and mortality weekly report.

[5]  A. García-Sastre,et al.  Influenza A viruses: new research developments , 2011, Nature Reviews Microbiology.

[6]  Dong Xu,et al.  Distinct glycan topology for avian and human sialopentasaccharide receptor analogues upon binding different hemagglutinins: a molecular dynamics perspective. , 2009, Journal of molecular biology.

[7]  Yoshiyuki Suzuki,et al.  Compensatory Evolution of Net-Charge in Influenza A Virus Hemagglutinin , 2012, PloS one.

[8]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Y. Suzuki [Variation of influenza viruses and their recognition of the receptor sialo-sugar chains]. , 1993, Yakugaku zasshi : Journal of the Pharmaceutical Society of Japan.

[10]  José Augusto Baranauskas,et al.  How Many Trees in a Random Forest? , 2012, MLDM.

[11]  C. Bender,et al.  Differences in the biological phenotype of low-yielding (L) and high-yielding (H) variants of swine influenza virus A/NJ/11/76 are associated with their different receptor-binding activity. , 1998, Virology.

[12]  Vasant Honavar,et al.  On Evaluating MHC-II Binding Peptide Prediction Methods , 2008, PloS one.

[13]  L. Jiang,et al.  PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[15]  B. Sankaran,et al.  The Influenza A Virus Protein NS1 Displays Structural Polymorphism , 2014, Journal of Virology.

[16]  B. Murphy,et al.  A single amino acid in the PB2 gene of influenza A virus is a determinant of host range , 1993, Journal of virology.

[17]  Yi Guan,et al.  Full Factorial Analysis of Mammalian and Avian Influenza Polymerase Subunits Suggests a Role of an Efficient Polymerase for Virus Adaptation , 2009, PloS one.

[18]  Nimalan Arinaminpathy,et al.  Dynamics of Glycoprotein Charge in the Evolutionary History of Human Influenza , 2010, PloS one.

[19]  Walter Fiers,et al.  Complete structure of A/duck/Ukraine/63 influenza hemagglutinin gene: Animal virus as progenitor of human H3 Hong Kong 1968 influenza hemagglutinin , 1981, Cell.

[20]  J. Skehel,et al.  Receptor binding and membrane fusion in virus entry: the influenza hemagglutinin. , 2000, Annual review of biochemistry.

[21]  Yu Wang,et al.  Origin and diversity of novel avian influenza A H7N9 viruses causing human infection: phylogenetic, structural, and coalescent analyses , 2013, The Lancet.

[22]  E. D. Kilbourne Influenza Pandemics of the 20th Century , 2006, Emerging infectious diseases.

[23]  Yossa Dwi Hartono,et al.  Molecular dynamics studies of human receptor molecule in hemagglutinin of 1918 and 2009 H1N1 influenza viruses , 2011, Journal of molecular modeling.

[24]  Guang-Wu Chen,et al.  Genomic Signatures of Human versus Avian Influenza A Viruses , 2006, Emerging infectious diseases.

[25]  W. Barclay,et al.  Viral determinants of influenza A virus host range. , 2014, The Journal of general virology.

[26]  S. Goodbourn,et al.  NS1 Proteins of Avian Influenza A Viruses Can Act as Antagonists of the Human Alpha/Beta Interferon Response , 2006, Journal of Virology.

[27]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[28]  Yoshihiro Kawaoka,et al.  Molecular Basis for High Virulence of Hong Kong H5N1 Influenza A Viruses , 2001, Science.

[29]  Yoshihiro Kawaoka,et al.  PB2 amino acid at position 627 affects replicative efficiency, but not cell tropism, of Hong Kong H5N1 influenza A viruses in mice. , 2004, Virology.

[30]  M. Levitt,et al.  Conformation of amino acid side-chains in proteins. , 1978, Journal of molecular biology.

[31]  Richard H Scheuermann,et al.  Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance , 2012, Influenza and other respiratory viruses.

[32]  Rahul Raman,et al.  Hemagglutinin Receptor Binding Avidity Drives Influenza A Virus Antigenic Drift , 2009, Science.

[33]  Y. Z. Chen,et al.  Prediction of MHC-binding peptides of flexible lengths from sequence-derived structural and physicochemical properties. , 2007, Molecular immunology.

[34]  L. Kier,et al.  Amino acid side chain parameters for correlation studies in biology and pharmacology. , 2009, International journal of peptide and protein research.

[35]  John Steel,et al.  Transmission of Influenza Virus in a Mammalian Host Is Increased by PB2 Amino Acids 627K or 627E/701N , 2009, PLoS pathogens.

[36]  J. Paulson,et al.  Receptor determinants of human and animal influenza virus isolates: differences in receptor specificity of the H3 hemagglutinin based on species of origin. , 1983, Virology.

[37]  Petra Perner,et al.  Machine Learning and Data Mining in Pattern Recognition , 2009, Lecture Notes in Computer Science.

[38]  N. Cox,et al.  Polygenic virulence factors involved in pathogenesis of 1997 Hong Kong H5N1 influenza viruses in mice. , 2007, Virus research.

[39]  Zheng Rong Yang,et al.  Machine Learning Approaches to Bioinformatics , 2010, Science, Engineering, and Biology Informatics.

[40]  T. Steitz,et al.  Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. , 1986, Annual review of biophysics and biophysical chemistry.

[41]  W. J. Bean,et al.  Origin of the pandemic 1957 H2 influenza A virus and the persistence of its possible progenitors in the avian reservoir. , 1993, Virology.

[42]  Jie Dong,et al.  Human Infection with a Novel Avian-Origin Influenza A (H7N9) Virus. , 2018 .

[43]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[44]  J. Taubenberger,et al.  The PB2-E627K Mutation Attenuates Viruses Containing the 2009 H1N1 Influenza Pandemic Polymerase , 2010, mBio.

[45]  S. Teneberg,et al.  Avian influenza A viruses differ from human viruses by recognition of sialyloligosaccharides and gangliosides and by a higher conservation of the HA receptor-binding site. , 1997, Virology.

[46]  I. Muchnik,et al.  Recognition of a protein fold in the context of the SCOP classification , 1999 .

[47]  E. Holmes,et al.  Host Species Barriers to Influenza Virus Infections , 2006, Science.

[48]  Y. Suzuki,et al.  Gangliosides as influenza virus receptors. Variation of influenza viruses and their recognition of the receptor sialo-sugar chains. , 1994, Progress in lipid research.

[49]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[50]  M. Kanehisa,et al.  Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. , 1996, Protein engineering.

[51]  David J. Stevens,et al.  Haemagglutinin mutations responsible for the binding of H5N1 influenza A viruses to human-type receptors , 2006, Nature.

[52]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[53]  Jia Wang,et al.  Predicting transmission of avian influenza A viruses from avian to human by using informative physicochemical properties , 2013, Int. J. Data Min. Bioinform..

[54]  Zexian Liu,et al.  Towards a better understanding of the novel avian-origin H7N9 influenza A virus in China , 2013, Scientific Reports.

[55]  N. Daigle,et al.  Structure and nuclear import function of the C-terminal domain of influenza virus polymerase PB2 subunit , 2007, Nature Structural &Molecular Biology.

[56]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[57]  Zheng Kou,et al.  Prediction of interspecies transmission for avian influenza A virus based on a back-propagation neural network , 2010, Math. Comput. Model..

[58]  M. Charton,et al.  The structural dependence of amino acid hydrophobicity parameters. , 1982, Journal of theoretical biology.

[59]  M. Kanehisa,et al.  Prediction of protein function from sequence properties. Discriminant analysis of a data base. , 1984, Biochimica et biophysica acta.