An empirical study of empty prediction of multi-label classification

This is the first empirical study of empty prediction of multi-label classification.Every algorithm considered all made empty prediction on different datasets.HOMER and RAkEL have the overall lowest empty prediction rates in the study.Four thresholding methods which in theory can solve empty predictions are compared.Probabilistic thresholds are the best solution in terms of example based F1. A detailed and extensive empirical study of empty prediction of multi-label classification is conducted in this paper and to the best of our knowledge this work is the first empirical study of this problem.Total 8 state of the art multi-label classification methods, BR, CC, CLR, HOMER, RAkEL, ECC, MLkNN, and BRkNN, are compared on 11 datasets. The empirical results clearly answer the two research questions, (1) whether empty prediction problems happen in commonly used state of the art multi-label classification methods and what their empty prediction rates (EPR) on different test sets are and (2) what multi-label classification methods are with overall highest/lowest EPRs. Specifically, it is empirically shown that every method considered all made empty predictions on different datasets. In addition, several thresholding methods which in theory can solve empty prediction are compared. The clear answers to the two research questions and the experimental findings are the main contributions of this work to multi-label classification.

[1]  Jiun-Hung Chen,et al.  A multi-label classification based approach for sentiment classification , 2015, Expert Syst. Appl..

[2]  Grigorios Tsoumakas,et al.  Obtaining Bipartitions from Score Vectors for Multi-Label Classification , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[5]  Luis von Ahn,et al.  Human computation , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[6]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[7]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[8]  Denis Deratani Mauá,et al.  Trading off Speed and Accuracy in Multilabel Classification , 2014, Probabilistic Graphical Models.

[9]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[11]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[12]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[13]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[14]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[16]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[17]  Guozheng Li,et al.  Modelling of inquiry diagnosis for coronary heart disease in traditional Chinese medicine by using multi-label learning , 2010, BMC complementary and alternative medicine.

[18]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[19]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[20]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[21]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[22]  Yiannis Kompatsiaris,et al.  An Empirical Study of Multi-label Learning Methods for Video Annotation , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[23]  José Ramón Quevedo,et al.  Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[24]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[25]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[26]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[27]  Francisco Herrera,et al.  A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms , 2011, Swarm Evol. Comput..

[28]  Yiqin Wang,et al.  Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine , 2013, Science China Information Sciences.

[29]  Chi-Man Vong,et al.  A New Framework of Simultaneous-Fault Diagnosis Using Pairwise Probabilistic Multi-Label Classification for Time-Dependent Patterns , 2013, IEEE Transactions on Industrial Electronics.

[30]  Heejo Lee,et al.  Detecting Malicious Web Links and Identifying Their Attack Types , 2011, WebApps.

[31]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[32]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[33]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[34]  Fan Yang,et al.  Reliable Multi-Label Learning via Conformal Predictor and Random Forest for Syndrome Differentiation of Chronic Fatigue in Traditional Chinese Medicine , 2014, PloS one.

[35]  Rémi Gilleron,et al.  Learning Multi-label Alternating Decision Trees from Texts and Data , 2003, MLDM.