Labelling strategies for hierarchical multi-label classification techniques

Many hierarchical multi-label classification systems predict a real valued score for every (instance, class) couple, with a higher score reflecting more confidence that the instance belongs to that class. These classifiers leave the conversion of these scores to an actual label set to the user, who applies a cut-off value to the scores. The predictive performance of these classifiers is usually evaluated using threshold independent measures like precision-recall curves. However, several applications require actual label sets, and thus an automatic labelling strategy.In this paper, we present and evaluate different alternatives to perform the actual labelling in hierarchical multi-label classification. We investigate the selection of both single and multiple thresholds. Despite the existence of multiple threshold selection strategies in non-hierarchical multi-label classification, they cannot be applied directly to the hierarchical context. The proposed strategies are implemented within two main approaches: optimisation of a certain performance measure of interest (such as F-measure or hierarchical loss), and simulating training set properties (such as class distribution or label cardinality) in the predictions. We assess the performance of the proposed labelling schemes on 10 datasets from different application domains. Our results show that selecting multiple thresholds may result in an efficient and effective solution for hierarchical multi-label problems. HighlightsWe select single or multiple thresholds for hierarchical multi-label classifiers.Selecting multiple thresholds often yield better label sets in lesser time.We show that optimising H-loss tends to favor empty label sets.Multiple threshold selection is preferred for micro F-measure and HMC-loss.Imitating training set properties is a competitive approach to optimise HMC-loss.

[1]  Saso Dzeroski,et al.  Tree ensembles for predicting structured outputs , 2013, Pattern Recognit..

[2]  Fabio Roli,et al.  Threshold optimisation for multi-label classifiers , 2013, Pattern Recognit..

[3]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[4]  Saso Dzeroski,et al.  Predicting gene function using hierarchical multi-label decision tree ensembles , 2010, BMC Bioinformatics.

[5]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[6]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[7]  Alex Alves Freitas,et al.  A hierarchical multi-label classification ant colony algorithm for protein function prediction , 2010, Memetic Comput..

[8]  Nicolò Cesa-Bianchi,et al.  Synergy of multi-label hierarchical ensembles, data fusion, and cost-sensitive methods for gene functional inference , 2012, Machine Learning.

[9]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[10]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[11]  J. L. Hodges,et al.  Rank Methods for Combination of Independent Experiments in Analysis of Variance , 1962 .

[12]  Richard J. Edwards,et al.  Estimation and efficient computation of the true probability of recurrence of short linear protein sequence motifs in unrelated proteins , 2010, BMC Bioinformatics.

[13]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[14]  Claudio Gentile,et al.  Hierarchical classification: combining Bayes with SVM , 2006, ICML.

[15]  Farshad Fotouhi,et al.  Exploiting Label Dependency for Hierarchical Multi-label Classification , 2012, PAKDD.

[16]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[17]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[18]  Dmitrij Frishman,et al.  MIPS: a database for genomes and protein sequences , 1999, Nucleic Acids Res..

[19]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[20]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..

[21]  Qiang Ji,et al.  Multi-label learning with missing labels for image annotation and facial action unit recognition , 2015, Pattern Recognit..

[22]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[23]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[24]  Juho Rousu,et al.  Learning hierarchical multi-category text classification models , 2005, ICML.

[25]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[26]  Stan Matwin,et al.  Learning and Evaluation in the Presence of Class Hierarchies: Application to Text Categorization , 2006, Canadian AI.

[27]  HerreraFrancisco,et al.  Self-labeled techniques for semi-supervised learning , 2015 .

[28]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[29]  Celine Vens,et al.  Predicting gene function in S. cerevisiae and A. thaliana using hierarchical multi-label decision tree ensembles , 2008 .

[30]  Alex Alves Freitas,et al.  An Extensive Evaluation of Decision Tree–Based Hierarchical Multilabel Classification Methods and Performance Measures , 2015, Comput. Intell..

[31]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[32]  Francisco Herrera,et al.  Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study , 2015, Knowledge and Information Systems.

[33]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[34]  Thomas Gärtner,et al.  On Structured Output Training: Hard Cases and an Efficient Alternative , 2009, ECML/PKDD.

[35]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[36]  PartalasIoannis,et al.  Evaluation measures for hierarchical classification , 2015 .

[37]  Georgios Paliouras,et al.  Evaluation measures for hierarchical classification: a unified view and novel approaches , 2013, Data Mining and Knowledge Discovery.

[38]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[40]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[41]  James T. Kwok,et al.  Hierarchical Multilabel Classification with Minimum Bayes Risk , 2012, 2012 IEEE 12th International Conference on Data Mining.

[42]  Giorgio Valentini,et al.  True Path Rule Hierarchical Ensembles for Genome-Wide Gene Function Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[43]  Fernando Benites,et al.  Multi-label classification and extracting predicted class hierarchies , 2011, Pattern Recognit..

[44]  Saso Dzeroski,et al.  The importance of the label hierarchy in hierarchical multi-label classification , 2015, Journal of Intelligent Information Systems.

[45]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..