Integrating Decision Tree and Hidden Markov Model (HMM) for Subtype Prediction of Human Influenza A Virus

Multiple criteria decision making (MCDM) has significant impact in bioinformatics. In the research reported here, we explore the integration of decision tree (DT) and Hidden Markov Model (HMM) for subtype prediction of human influenza A virus. Infection with influenza viruses continues to be an important public health problem. Viral strains of subtype H3N2 and H1N1 circulates in humans at least twice annually. The subtype detection depends mainly on the antigenic assay, which is time-consuming and not fully accurate. We have developed a Web system for accurate subtype detection of human influenza virus sequences. The preliminary experiment showed that this system is easy-to-use and powerful in identifying human influenza subtypes. Our next step is to examine the informative positions at the protein level and extend its current functionality to detect more subtypes. The web functions can be accessed at http://glee.ist.unomaha.edu/.

[1]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[2]  E. Holmes,et al.  The population genetics and evolutionary epidemiology of RNA viruses , 2004, Nature Reviews Microbiology.

[3]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[4]  A. Fauci,et al.  The challenge of emerging and re-emerging infectious diseases , 2004, Nature.

[5]  R. Ydenberg,et al.  Avian Influenza: An Ecological and Evolutionary Perspective for Waterbird Scientists , 2006 .

[6]  Sattar Hashemi,et al.  A decision tree-based approach for determining low bone mineral density in inflammatory bowel disease using WEKA software , 2007, European journal of gastroenterology & hepatology.

[7]  Giorgio Valentini,et al.  Computational intelligence and machine learning in bioinformatics , 2009, Artif. Intell. Medicine.

[8]  Christie Johnson,et al.  Influenza A viruses in feral Canadian ducks: extensive reassortment in nature. , 2004, The Journal of general virology.

[9]  D. Higgins,et al.  T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.

[10]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[11]  Iain Stephenson,et al.  Influenza: current threat from avian influenza. , 2005, British medical bulletin.

[12]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[13]  R. Benediktsson,et al.  Outcomes of educational interventions in type 2 diabetes: WEKA data-mining analysis. , 2007, Patient education and counseling.

[14]  Tobias Müller,et al.  Modelling interaction sites in protein domains with interaction profile hidden Markov models , 2006, Bioinform..

[15]  Zhengxin Chen,et al.  Classification methods for HIV-1 medicated neuronal damage , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[16]  Clarisse Dhaenens,et al.  A multicriteria genetic algorithm to analyze microarray data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[17]  Y Li,et al.  The evolution of H5N1 influenza viruses in ducks in southern China. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Albert D. M. E. Osterhaus,et al.  Characterization of a Novel Influenza A Virus Hemagglutinin Subtype (H16) Obtained from Black-Headed Gulls , 2005, Journal of Virology.

[19]  Ralf Zimmer,et al.  BioWeka - extending the Weka framework for bioinformatics , 2007, Bioinform..

[20]  F. Lootsma Multicriteria decision analysis in a decision tree , 1997 .

[21]  Tatiana A. Tatusova,et al.  FLAN: a web server for influenza virus genome annotation , 2007, Nucleic Acids Res..

[22]  A. D. McLachlan,et al.  Profile analysis: detection of distantly related proteins. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Lei Liu,et al.  In silico discovery of human natural antisense transcripts , 2006, BMC Bioinformatics.

[24]  Etsuko N. Moriyama,et al.  GenomeBlast: a web tool for small genome comparison , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[25]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[26]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..