Multi-label classification with a reject option

We consider multi-label classification problems in application scenarios where classifier accuracy is not satisfactory, but manual annotation is too costly. In single-label problems, a well known solution consists of using a reject option, i.e., allowing a classifier to withhold unreliable decisions, leaving them (and only them) to human operators. We argue that this solution can be exploited also in multi-label problems. However, the current theoretical framework for classification with a reject option applies only to single-label problems. We thus develop a specific framework for multi-label ones. In particular, we extend multi-label accuracy measures to take into account rejections, and define manual annotation cost as a cost function. We then formalise the goal of attaining a desired trade-off between classifier accuracy on non-rejected decisions, and the cost of manually handling rejected decisions, as a constrained optimisation problem. We finally develop two possible implementations of our framework, tailored to the widely used F accuracy measure, and to the only cost models proposed so far for multi-label annotation tasks, and experimentally evaluate them on five application domains.

[1]  Pierre Beauseroy,et al.  Quality assessment of a supervised multilabel classification rule with performance constraints , 2006, 2006 14th European Signal Processing Conference.

[2]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[3]  Fabio Roli,et al.  Threshold optimisation for multi-label classifiers , 2013, Pattern Recognit..

[4]  Rong Yan,et al.  An efficient manual image annotation approach based on tagging and browsing , 2007, MS '07.

[5]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[6]  David R. Karger,et al.  Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[7]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[8]  Fabio Roli,et al.  Classification with reject option in text categorisation systems , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[9]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[10]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[11]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[12]  Stefanie Nowak,et al.  New Strategies for Image Annotation: Overview of the Photo Annotation Task at ImageCLEF 2010 , 2010, CLEF.

[13]  Tadeusz Pietraszek,et al.  On the use of ROC analysis for the optimization of abstaining classifiers , 2007, Machine Learning.

[14]  Francesco Tortorella,et al.  A ROC-based reject rule for dichotomizers , 2005, Pattern Recognit. Lett..

[15]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[16]  José Ramón Quevedo,et al.  Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[17]  Fabio Roli,et al.  A Classification Approach with a Reject Option for Multi-label Problems , 2011, ICIAP.

[18]  Ivan Flores,et al.  An Optimum Character Recognition System Using Decision Functions , 1958, IRE Trans. Electron. Comput..

[19]  Fabio Roli,et al.  A Two-Stage Classifier with Reject Option for Text Categorisation , 2004, SSPR/SPR.

[20]  Hyoil Han,et al.  Survey of semantic annotation platforms , 2005, SAC '05.

[21]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[23]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[24]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[25]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.