Information Gain as a Feature Selection Method for the Efficient Classification of Influenza Based on Viral Hosts

and nonhuman hosts prior to classification analysis. Accuracy, sensitivity, specificity, precision and time were used as performance measures. Extracting the best hundred informative positions with information gain increased classification efficiency by 90% for both classifiers, without compromising performance significantly. NNs performed better on both DNA segments than DTs, when decreasing the number of informative positions below a hundred. The classification speed of NNs was improved vastly compared to DTs, when classifying the H1, PB1 segment.

[1]  G. Zhou,et al.  Neural network optimization for E. coli promoter prediction. , 1991, Nucleic acids research.

[2]  Fan Yang,et al.  Gene Expression Classification: Decision Trees vs. SVMs , 2003, FLAIRS Conference.

[3]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[4]  J. Oh,et al.  Isolation and phylogenetic analysis of H1N1 swine influenza virus isolated in Korea. , 2007, Virus research.

[5]  F. Kostolanský,et al.  The factors of virulence of influenza a virus. , 2005, Acta virologica.

[6]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[7]  C. Scholtissek,et al.  On the origin of the human influenza virus subtypes H2N2 and H3N2. , 1978, Virology.

[8]  S. Salzberg,et al.  Large-scale sequencing of human influenza reveals the dynamic nature of viral genome evolution , 2005, Nature.

[9]  J. Pedersen,et al.  Hemagglutination-inhibition test for avian influenza virus subtype identification and the detection and quantitation of serum antibodies to the avian influenza virus. , 2008, Methods in molecular biology.

[10]  J. Pedersen,et al.  Neuraminidase-inhibition assay for the identification of influenza A virus neuraminidase subtype or neuraminidase antibody specificity. , 2008, Methods in molecular biology.

[11]  A. Sami,et al.  Decision Tree Construction for Genetic Applications Based on Association Rules , 2005, TENCON 2005 - 2005 IEEE Region 10 Conference.

[12]  Pavan Kumar Attaluri Classifying influenza subtypes and hosts using machine learning techniques. , 2012 .

[13]  Anju Vyas Print , 2003 .

[14]  C. Wu,et al.  Back-propagation and counter-propagation neural networks for phylogenetic classification of ribosomal RNA sequences. , 1994, Nucleic acids research.

[15]  Kwong-Sak Leung,et al.  Data Mining on DNA Sequences of Hepatitis B Virus , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[16]  B. Styk,et al.  [Biology of the influenza viruses]. , 1983, Bratislavske lekarske listy.

[17]  Mahmoud ElHefnawi,et al.  Accurate classification and hemagglutinin amino acid signatures for influenza A virus host-origin association and subtyping. , 2014, Virology.

[18]  Yasser M. Kadah,et al.  INFLUENZA A SUBTYPING AND HOST ORIGIN CLASSIFICATION USING PROFILE HIDDEN MARKOV MODELS , 2012 .

[19]  Kenneth H. Stokoe,et al.  Proceedings of the World Congress on Engineering 2013, WCE 2013 , 2013 .

[20]  H. Klenk,et al.  The viral polymerase mediates adaptation of an avian influenza virus to a mammalian host. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  S. Salzberg,et al.  Microbial gene identification using interpolated Markov models. , 1998, Nucleic acids research.

[22]  S. Knudsen,et al.  Neural network detects errors in the assignment of mRNA splice sites. , 1990, Nucleic acids research.

[23]  A. Lapedes,et al.  Determination of eukaryotic protein coding regions using neural networks and information theory. , 1992, Journal of molecular biology.