Improving the performance of hierarchical classification with swarm intelligence

In this paper we propose a new method to improve the performance of hierarchical classification. We use a swarm intelligence algorithm to select the type of classification algorithm to be used at each "classifier node" in a classifier tree. These classifier nodes are used in a top-down divide and conquer fashion to classify the examples from hierarchical data sets. In this paper we propose a swarm intelligence based approach which attempts to mitigate a major drawback with a recently proposed local search-based, greedy algorithm. Our swarm intelligence based approach is able to take into account classifier interactions whereas the greedy algorithm is not. We evaluate our proposed method against the greedy method in four challenging bioinformatics data sets and find that, overall, there is a significant increase in performance.

[1]  G. Li,et al.  Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform , 2006, Amino Acids.

[2]  Robert D. Finn,et al.  New developments in the InterPro database , 2007, Nucleic Acids Res..

[3]  Amanda Clare,et al.  Machine learning of functional class from phenotype data , 2002, Bioinform..

[4]  Gajendra P. S. Raghava,et al.  GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors , 2004, Nucleic Acids Res..

[5]  Ian Witten,et al.  Data Mining , 2000 .

[6]  Zoi I. Litou,et al.  A Novel method for GPCR recognition and family classification from sequence alone using signatures derived from profile hidden Markov models , 2003, SAR and QSAR in environmental research.

[7]  Alex Alves Freitas,et al.  A hybrid PSO/ACO algorithm for classification , 2007, GECCO '07.

[8]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[9]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[10]  Alex A. Freitas,et al.  HIERARCHICAL CLASSIFICATION OF G-PROTEIN-COUPLED RECEPTORS WITH A PSO/ACO ALGORITHM , 2006 .

[11]  Terri K. Attwood,et al.  The PRINTS Database: A Resource for Identification of Protein Families , 2002, Briefings Bioinform..

[12]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[13]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[14]  Matthew N. Davies,et al.  An experimental comparison of classification algorithms for hierarchical prediction of protein function , 2007 .

[15]  Thomas Stützle,et al.  Ant Colony Optimization , 2009, EMO.

[16]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[17]  Thomas Stützle,et al.  Ant Colony Optimization Theory , 2004 .

[18]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..