Active Multi-label Learning with Optimal Label Subset Selection

Multi-label classification, where each instance is assigned with multiple labels, has been an attractive research topic in data mining. The annotations of multi-label instances are typically more difficult and time consuming, since they are simultaneously associated with multiple labels. Therefore, active learning, which reduces the labeling cost by actively querying the labels of the most valuable data, becomes particularly important for multi-label learning. Study reveals that methods querying instance-label pairs are more effective than those query instances, since for each sample, only some effective labels need to be annotated while others can be inferred by exploring the label correlations. However, with the high dimensionality of label space, the instance-label pair selective algorithm will be affected since the computational cost of training a multi-label model may be strongly affected by the number of labels. In this paper we propose an approach that combines instance sampling with optimal label subset selection, which can effectively improve the classification model performance and substantially reduce the annotation cost. Experimental results demonstrate the superiority of the proposed approach to state-of-the-art methods on three benchmark datasets.

[1]  Yi Yang,et al.  Web and Personal Image Annotation by Mining Label Correlation With Relaxed Visual Graph Embedding , 2012, IEEE Transactions on Image Processing.

[2]  Xian-Sheng Hua,et al.  Two-Dimensional Multilabel Active Learning with an Efficient Online Adaptation Model for Image Classification , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Milad Shokouhi,et al.  Advances in Information Retrieval Theory, Second International Conference on the Theory of Information Retrieval, ICTIR 2009, Cambridge, UK, September 10-12, 2009, Proceedings , 2009, ICTIR.

[4]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[5]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Xin Li,et al.  Active Learning with Multi-Label SVM Classification , 2013, IJCAI.

[8]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[9]  Lawrence O. Hall,et al.  Active learning to recognize multiple types of plankton , 2004, ICPR 2004.

[10]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[12]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[14]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[15]  Michael K. Ng,et al.  Transductive Multilabel Learning via Label Set Propagation , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Rong Yan,et al.  Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Klaus Brinker,et al.  On Active Learning in Multi-label Classification , 2005, GfKl.

[18]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[19]  Zheng Chen,et al.  Effective multi-label active learning for text classification , 2009, KDD.

[20]  Gesellschaft für Klassifikation. Jahrestagung,et al.  From Data and Information Analysis to Knowledge Engineering, Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Magdeburg, March 9-11, 2005 , 2006, GfKl.

[21]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[22]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[23]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[24]  Joost N. Kok Machine Learning: ECML 2007, 18th European Conference on Machine Learning, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, ECML.

[25]  Andrea Esuli,et al.  Training Data Cleaning for Text Classification , 2009, ICTIR.

[26]  Lei Wang,et al.  Multilabel SVM active learning for image classification , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[27]  Paul N. Bennett,et al.  Dual Strategy Active Learning , 2007, ECML.