Hierarchical Multilabel Classification with Minimum Bayes Risk

Hierarchical multilabel classification (HMC) allows an instance to have multiple labels residing in a hierarchy. A popular loss function used in HMC is the H-loss, which penalizes only the first classification mistake along each prediction path. However, the H-loss metric can only be used on tree-structured label hierarchies, but not on DAG hierarchies. Moreover, it may lead to misleading predictions as not all misclassifications in the hierarchy are penalized. In this paper, we overcome these deficiencies by proposing a hierarchy-aware loss function that is more appropriate for HMC. Using Bayesian decision theory, we then develop a Bayes-optimal classifier with respect to this loss function. Instead of requiring an exhaustive summation and search for the optimal multilabel, the proposed classification problem can be efficiently solved using a greedy algorithm on both tree-and DAG-structured label hierarchies. Experimental results on a large number of real-world data sets show that the proposed algorithm outperforms existing HMC methods.

[1]  Joydeep Ghosh,et al.  Automatically learning document taxonomies for hierarchical classification , 2005, WWW '05.

[2]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[3]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[4]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[5]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[6]  Tibério S. Caetano,et al.  Reverse Multi-Label Learning , 2010, NIPS.

[7]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[8]  Yoram Singer,et al.  Large margin hierarchical classification , 2004, ICML.

[9]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[10]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[11]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .

[12]  James T. Kwok,et al.  MultiLabel Classification on Tree- and DAG-Structured Hierarchies , 2011, ICML.

[13]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[14]  Douglas L. Jones,et al.  A signal-dependent time-frequency representation: fast algorithm for optimal kernel design , 1994, IEEE Trans. Signal Process..

[15]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[16]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[17]  Eyke Hüllermeier,et al.  Case-Based Multilabel Ranking , 2007, IJCAI.

[18]  Claudio Gentile,et al.  Hierarchical classification: combining Bayes with SVM , 2006, ICML.

[19]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[21]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[22]  Christoph H. Lampert Maximum Margin Multi-Label Structured Prediction , 2011, NIPS.

[23]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[24]  Eyke Hüllermeier,et al.  An Exact Algorithm for F-Measure Maximization , 2011, NIPS.

[25]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[26]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[27]  Nicolò Cesa-Bianchi,et al.  Hierarchical Cost-Sensitive Algorithms for Genome-Wide Gene Function Prediction , 2009, MLSB.

[28]  Yiming Yang,et al.  The Enron Corpus: A New Dataset for Email Classi(cid:12)cation Research , 2004 .

[29]  Concha Bielza,et al.  Bayesian Chain Classifiers for Multidimensional Classification , 2011, IJCAI.

[30]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.