Ant colony optimization based hierarchical multi-label classification algorithm

Display OmittedAn example search space of hmAntMiner-C for constructing rule antecedent. This paper presents a hierarchical multi-label classification algorithm (hmAntMiner-C).It uses correlation of attribute-value pairs for constructing IF-THEN rule list.Comparison is provided with some other state of the art algorithms with promising results. There exist numerous state of the art classification algorithms that are designed to handle the data with nominal or binary class labels, where a sample belongs to only a single class label. In these problems, known as flat classification problems, class labels are independent of each other. Unfortunately, on the other hand, less attention is given to the genre of classification problems where samples may belong to several classes and at the same time the class labels are organized based on a structured hierarchy; such as gene ontology, protein function prediction, test scores, web page categorization, text categorization etc. This article presents a novel Ant Colony Optimization based hierarchical multi-label classification algorithm that can handle such a complex instance of classification problems and can incorporates the given class hierarchy during its learning phase. The algorithm produces IF-THEN ordered rule list to learn a comprehensible model which can easily be verified by experts. It exploits positive correlation between the domain values of two related attributes to improve the discrimination power of resultant classification model, up to a significant level. The paper contains rich details regarding hierarchical single label (or single path) and multi-label classification problems and different categories of corresponding solutions. The proposed method is evaluated on sixteen most challenging bioinformatics datasets; some of these containing hundreds of attributes and thousands of class labels. At the end, the proposed method is compared with four recent state of the art hierarchical multi-label classification algorithms. The empirical evaluation confirms the promising ability of the proposed technique for hierarchical multi-label classification task.

[1]  Andries Petrus Engelbrecht,et al.  Fundamentals of Computational Swarm Intelligence , 2005 .

[2]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[3]  Andries P. Engelbrecht,et al.  Computational Intelligence: An Introduction , 2002 .

[4]  Alex Alves Freitas,et al.  Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation , 2008, Soft Comput..

[5]  Ajith Abraham,et al.  Swarm Intelligence in Data Mining , 2009, Swarm Intelligence in Data Mining.

[6]  Mohsin Bilal,et al.  Solution of n-Queen problem using ACO , 2009, 2009 IEEE 13th International Multitopic Conference.

[7]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[8]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[9]  Marco Dorigo,et al.  Ant system: optimization by a colony of cooperating agents , 1996, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Dmitrij Frishman,et al.  MIPS: a database for protein sequences and complete genomes , 1998, Nucleic Acids Res..

[11]  B. Alberts,et al.  Molecular Biology of the Cell (4th Ed) , 2002 .

[12]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Alex Alves Freitas,et al.  A New Sequential Covering Strategy for Inducing Classification Rules With Ant Colony Algorithms , 2013, IEEE Transactions on Evolutionary Computation.

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[16]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Amanda Clare,et al.  Functional bioinformatics for Arabidopsis thaliana , 2006, Bioinform..

[19]  Alex Alves Freitas,et al.  A hierarchical multi-label classification ant colony algorithm for protein function prediction , 2010, Memetic Comput..

[20]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[21]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[22]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[23]  Ian Witten,et al.  Data Mining , 2000 .

[24]  Alex Alves Freitas,et al.  Data mining with an ant colony optimization algorithm , 2002, IEEE Trans. Evol. Comput..

[25]  Luca Maria Gambardella,et al.  Ant colony system: a cooperative learning approach to the traveling salesman problem , 1997, IEEE Trans. Evol. Comput..

[26]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[27]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[28]  Salabat Khan,et al.  Correlation as a Heuristic for Accurate and Comprehensible Ant Colony Optimization Based Classifiers , 2013, IEEE Transactions on Evolutionary Computation.

[29]  Abdul Rauf Baig,et al.  A correlation-based ant miner for classification rule discovery , 2012, Neural Computing and Applications.

[30]  Alex A. Freitas,et al.  A Tutorial on Hierarchical Classification with Applications in Bioinformatics. , 2007 .

[31]  Monique Snoeck,et al.  Classification With Ant Colony Optimization , 2007, IEEE Transactions on Evolutionary Computation.

[32]  Alex Alves Freitas,et al.  cAnt-Miner: An Ant Colony Classification Algorithm to Cope with Continuous Attributes , 2008, ANTS Conference.

[33]  Taghi M. Khoshgoftaar,et al.  RUSBoost: Improving classification performance when training data is skewed , 2008, 2008 19th International Conference on Pattern Recognition.

[34]  Thomas Stützle,et al.  Ant Colony Optimization , 2009, EMO.

[35]  Amanda Clare,et al.  Functional bioinformatics for Arabidopsis thaliana , 2006, Bioinform..

[36]  Salabat Khan,et al.  A novel ant colony optimization based single path hierarchical classification algorithm for predicting gene ontology , 2014, Appl. Soft Comput..

[37]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[38]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[39]  Ajith Abraham,et al.  Swarm Intelligence in Data Mining (Studies in Computational Intelligence) , 2006 .

[40]  Matthew N. Davies,et al.  An experimental comparison of classification algorithms for hierarchical prediction of protein function , 2007 .

[41]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[42]  Yen-Liang Chen,et al.  Constructing a decision tree from data with hierarchical class labels , 2009, Expert Syst. Appl..

[43]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[44]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[45]  Alex Alves Freitas,et al.  Improving the performance of hierarchical classification with swarm intelligence , 2008, EVOBIO 2008.

[46]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[47]  Alex Alves Freitas,et al.  A Hierarchical Classification Ant Colony Algorithm for Predicting Gene Ontology Terms , 2009, EvoBIO.

[48]  Stan Matwin,et al.  Functional Annotation of Genes Using Hierarchical Text Categorization , 2005 .