Bayes-Optimal Hierarchical Multilabel Classification

Hierarchical multilabel classification allows a sample to belong to multiple class labels residing on a hierarchy, which can be a tree or directed acyclic graph (DAG). However, popular hierarchical loss functions, such as the H-loss, can only be defined on tree hierarchies (but not on DAGs), and may also under- or over-penalize misclassifications near the bottom of the hierarchy. Besides, it has been relatively unexplored on how to make use of the loss functions in hierarchical multilabel classification. To overcome these deficiencies, we first propose hierarchical extensions of the Hamming loss and ranking loss which take the mistake at every node of the label hierarchy into consideration. Then, we first train a general learning model, which is independent of the loss function. Next, using Bayesian decision theory, we develop Bayes-optimal predictions that minimize the corresponding risks with the trained model. Computationally, instead of requiring an exhaustive summation and search for the optimal multilabel, the resultant optimization problem can be efficiently solved by a greedy algorithm. Experimental results on a number of real-world data sets show that the proposed Bayes-optimal classifier outperforms state-of-the-art methods.

[1]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[2]  Lin Xiao,et al.  Hierarchical Classification via Orthogonal Transfer , 2011, ICML.

[3]  Claudio Gentile,et al.  Hierarchical classification: combining Bayes with SVM , 2006, ICML.

[4]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[5]  Rong Jin,et al.  Multi-label learning with incomplete class assignments , 2011, CVPR 2011.

[6]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Shou-De Lin,et al.  Generalized k-Labelsets Ensemble for Multi-Label and Cost-Sensitive Classification , 2014, IEEE Transactions on Knowledge and Data Engineering.

[8]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[9]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[10]  James T. Kwok,et al.  Efficient Multi-label Classification with Many Labels , 2013, ICML.

[11]  Johannes Fürnkranz,et al.  Efficient Pairwise Multilabel Classification for Large-Scale Problems in the Legal Domain , 2008, ECML/PKDD.

[12]  Joydeep Ghosh,et al.  Automatically learning document taxonomies for hierarchical classification , 2005, WWW '05.

[13]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[14]  Liang Sun,et al.  Multi-Label Dimensionality Reduction , 2013 .

[15]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[16]  Stefanie Nowak,et al.  Performance measures for multilabel evaluation: a case study in the area of image classification , 2010, MIR '10.

[17]  Tibério S. Caetano,et al.  Reverse Multi-Label Learning , 2010, NIPS.

[18]  Tibério S. Caetano,et al.  Optimization of Robust Loss Functions for Weakly-Labeled Image Taxonomies , 2013, International Journal of Computer Vision.

[19]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[20]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[21]  Eyke Hüllermeier,et al.  An Exact Algorithm for F-Measure Maximization , 2011, NIPS.

[22]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[23]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[24]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[25]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[26]  Nicolò Cesa-Bianchi,et al.  Hierarchical Cost-Sensitive Algorithms for Genome-Wide Gene Function Prediction , 2009, MLSB.

[27]  Xindong Wu,et al.  Compressed labeling on distilled labelsets for multi-label learning , 2012, Machine Learning.

[28]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[29]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[30]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[31]  Concha Bielza,et al.  Bayesian Chain Classifiers for Multidimensional Classification , 2011, IJCAI.

[32]  James T. Kwok,et al.  Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[33]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[34]  James T. Kwok,et al.  Hierarchical Multilabel Classification with Minimum Bayes Risk , 2012, 2012 IEEE 12th International Conference on Data Mining.

[35]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[36]  Karin M. Verspoor,et al.  A categorization approach to automated ontological function annotation , 2006, Protein science : a publication of the Protein Society.

[37]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[38]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[39]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Michael K. Ng,et al.  Transductive Multilabel Learning via Label Set Propagation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[41]  James T. Kwok,et al.  MultiLabel Classification on Tree- and DAG-Structured Hierarchies , 2011, ICML.

[42]  David G. Stork,et al.  Pattern Classification , 1973 .

[43]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[44]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[45]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..