Multilabel Image Classification Via High-Order Label Correlation Driven Active Learning

Supervised machine learning techniques have been applied to multilabel image classification problems with tremendous success. Despite disparate learning mechanisms, their performances heavily rely on the quality of training images. However, the acquisition of training images requires significant efforts from human annotators. This hinders the applications of supervised learning techniques to large scale problems. In this paper, we propose a high-order label correlation driven active learning (HoAL) approach that allows the iterative learning algorithm itself to select the informative example-label pairs from which it learns so as to learn an accurate classifier with less annotation efforts. Four crucial issues are considered by the proposed HoAL: 1) unlike binary cases, the selection granularity for multilabel active learning need to be fined from example to example-label pair; 2) different labels are seldom independent, and label correlations provide critical information for efficient learning; 3) in addition to pair-wise label correlations, high-order label correlations are also informative for multilabel active learning; and 4) since the number of label combinations increases exponentially with respect to the number of labels, an efficient mining method is required to discover informative label correlations. The proposed approach is tested on public data sets, and the empirical results demonstrate its effectiveness.

[1]  Zhi-Hua Zhou,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2006, NIPS.

[2]  Daphne Koller,et al.  Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[3]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[4]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[6]  Yang Wang,et al.  Finding shareable informative patterns and optimal coding matrix for multiclass boosting , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Rong Yan,et al.  Model-shared subspace boosting for multi-label classification , 2007, KDD '07.

[8]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[9]  Yihong Gong,et al.  Multi-labelled classification using maximum entropy method , 2005, SIGIR '05.

[10]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[11]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[12]  Zhi-Hua Zhou,et al.  Multi-Label Learning with Weak Label , 2010, AAAI.

[13]  Prateek Jain,et al.  Far-sighted active learning on a budget for image and video recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[14]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[15]  Jie Xu,et al.  Region-based image categorization with reduced feature set , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[16]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[17]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[18]  Matthew Lease,et al.  On Quality Control and Machine Learning in Crowdsourcing , 2011, Human Computation.

[19]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[20]  Dana Angluin,et al.  Queries and concept learning , 1988, Machine Learning.

[21]  Yi Zhang,et al.  Incorporating Diversity and Density in Active Learning for Relevance Feedback , 2007, ECIR.

[22]  Yi Zhang,et al.  Multi-Task Active Learning with Output Constraints , 2010, AAAI.

[23]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[24]  Rong Jin,et al.  Semi-supervised SVM batch mode active learning for image retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Nello Cristianini,et al.  Query Learning with Large Margin Classi ersColin , 2000 .

[26]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[27]  Zhi-Hua Zhou,et al.  Solving multi-instance problems with classifier ensemble based on constructive clustering , 2007, Knowledge and Information Systems.

[28]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[29]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[30]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[31]  Jason Weston,et al.  Kernel methods for Multi-labelled classification and Categ orical regression problems , 2001, NIPS 2001.

[32]  Koby Crammer,et al.  A new family of online algorithms for category ranking , 2002, SIGIR '02.

[33]  Omar Alonso,et al.  Crowdsourcing for relevance evaluation , 2008, SIGF.

[34]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[35]  Klaus Brinker,et al.  Incorporating Diversity in Active Learning with Support Vector Machines , 2003, ICML.

[36]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[37]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[38]  Ming Yang,et al.  Intelligent Collaborative Tracking by Mining Auxiliary Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[39]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[40]  Yuhong Guo,et al.  Active Instance Sampling via Matrix Partition , 2010, NIPS.

[41]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[42]  Yang Wang,et al.  Batch mode active learning for multi-label image classification with informative label correlation mining , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[43]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[44]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[45]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[46]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Shiri Gordon,et al.  An efficient image similarity measure based on approximations of KL-divergence between two gaussian mixtures , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[48]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[49]  Yang Wang,et al.  Multi-class Graph Boosting with Subgraph Sharing for Object Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[50]  Yang Wang,et al.  Multiple-Instance learning from multiple perspectives: Combining models for Multiple-Instance learning , 2012, 2012 IEEE Workshop on the Applications of Computer Vision (WACV).

[51]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[52]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[53]  Rodney W. Johnson,et al.  Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy , 1980, IEEE Trans. Inf. Theory.