论文信息 - Multi-label classification with a reject option

Multi-label classification with a reject option

We consider multi-label classification problems in application scenarios where classifier accuracy is not satisfactory, but manual annotation is too costly. In single-label problems, a well known solution consists of using a reject option, i.e., allowing a classifier to withhold unreliable decisions, leaving them (and only them) to human operators. We argue that this solution can be exploited also in multi-label problems. However, the current theoretical framework for classification with a reject option applies only to single-label problems. We thus develop a specific framework for multi-label ones. In particular, we extend multi-label accuracy measures to take into account rejections, and define manual annotation cost as a cost function. We then formalise the goal of attaining a desired trade-off between classifier accuracy on non-rejected decisions, and the cost of manually handling rejected decisions, as a constrained optimisation problem. We finally develop two possible implementations of our framework, tailored to the widely used F accuracy measure, and to the only cost models proposed so far for multi-label annotation tasks, and experimentally evaluate them on five application domains.

[1] Pierre Beauseroy,et al. Quality assessment of a supervised multilabel classification rule with performance constraints , 2006, 2006 14th European Signal Processing Conference.

[2] Grigorios Tsoumakas,et al. MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[3] Fabio Roli,et al. Threshold optimisation for multi-label classifiers , 2013, Pattern Recognit..

[4] Rong Yan,et al. An efficient manual image annotation approach based on tagging and browsing , 2007, MS '07.

[5] Grigorios Tsoumakas,et al. Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[6] David R. Karger,et al. Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.

[7] C. K. Chow,et al. On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[8] Fabio Roli,et al. Classification with reject option in text categorisation systems , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[9] Chris Buckley,et al. OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[10] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[11] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.

[12] Stefanie Nowak,et al. New Strategies for Image Annotation: Overview of the Photo Annotation Task at ImageCLEF 2010 , 2010, CLEF.

[13] Tadeusz Pietraszek,et al. On the use of ROC analysis for the optimization of abstaining classifiers , 2007, Machine Learning.

[14] Francesco Tortorella,et al. A ROC-based reject rule for dichotomizers , 2005, Pattern Recognit. Lett..

[15] Eyke Hüllermeier,et al. On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[16] José Ramón Quevedo,et al. Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[17] Fabio Roli,et al. A Classification Approach with a Reject Option for Multi-label Problems , 2011, ICIAP.

[18] Ivan Flores,et al. An Optimum Character Recognition System Using Decision Functions , 1958, IRE Trans. Electron. Comput..

[19] Fabio Roli,et al. A Two-Stage Classifier with Reject Option for Text Categorisation , 2004, SSPR/SPR.

[20] Hyoil Han,et al. Survey of semantic annotation platforms , 2005, SAC '05.

[21] Geoff Holmes,et al. Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[22] Yiming Yang,et al. A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[23] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[24] Chih-Jen Lin,et al. A Study on Threshold Selection for Multi-label Classification , 2007 .

[25] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.