Effective active learning strategy for multi-label learning

Abstract Data labelling is commonly an expensive process that requires expert handling. In multi-label data, data labelling is further complicated owing to the experts must label several times each example, as each example belongs to various categories. Active learning is concerned with learning accurate classifiers by choosing which examples will be labelled, reducing the labelling effort and the cost of training an accurate model. The main challenge in performing multi-label active learning is designing effective strategies that measure the informative potential of unlabelled examples across all labels. This paper presents a new active learning strategy for working on multi-label data. Two uncertainty measures based on the base classifier predictions and the inconsistency of a predicted label set, respectively, were defined to select the most informative examples. The proposed strategy was compared to several state-of-the-art strategies on a large number of datasets. The experimental results showed the effectiveness of the proposal for better multi-label active learning.

[1]  Xian-Sheng Hua,et al.  Two-Dimensional Multilabel Active Learning with an Efficient Online Adaptation Model for Image Classification , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Zhi-Hua Zhou,et al.  Active Query Driven by Uncertainty and Diversity for Incremental Multi-label Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[3]  Corina Reischer,et al.  Some remarks on entropic measure of connexion and Hamming distance , 1979, RAIRO Theor. Informatics Appl..

[4]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[5]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[6]  Zheng Chen,et al.  Effective multi-label active learning for text classification , 2009, KDD.

[7]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[8]  Ronald Fagin,et al.  Comparing and aggregating rankings with ties , 2004, PODS '04.

[9]  Shiliang Sun,et al.  Gaussian process versus margin sampling active learning , 2015, Neurocomputing.

[10]  Andrea Esuli,et al.  Active Learning Strategies for Multi-Label Text Classification , 2009, ECIR.

[11]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[12]  Sethuraman Panchanathan,et al.  Optimal batch selection for active learning in multi-label classification , 2011, ACM Multimedia.

[13]  S. P. Wright,et al.  Adjusted P-values for simultaneous inference , 1992 .

[14]  Xindong Wu,et al.  Neighbor selection for multilabel classification , 2016, Neurocomputing.

[15]  Xian-Sheng Hua,et al.  Two-Dimensional Active Learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Haojie Li,et al.  Image multi-label annotation based on supervised nonnegative matrix factorization with new matching measurement , 2017, Neurocomputing.

[17]  Nicolò Cesa-Bianchi,et al.  Hierarchical Cost-Sensitive Algorithms for Genome-Wide Gene Function Prediction , 2009, MLSB.

[18]  Tao Li,et al.  Detecting emotion in music , 2003, ISMIR.

[19]  Eranga Ukwatta,et al.  Vision Based Metal Spectral Analysis Using Multi-label Classification , 2009, 2009 Canadian Conference on Computer and Robot Vision.

[20]  Jun Li,et al.  Multi-label maximum entropy model for social emotion classification over short text , 2016, Neurocomputing.

[21]  Pengpeng Zhao,et al.  Multi-label active learning for image classification , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[22]  Xian-Sheng Hua,et al.  A transductive multi-label learning approach for video concept detection , 2011, Pattern Recognit..

[23]  Ashish Kapoor,et al.  Active learning for sparse bayesian multilabel classification , 2014, KDD.

[24]  Xiaoyu Zhang,et al.  Update vs. upgrade: Modeling with indeterminate multi-class active learning , 2015, Neurocomputing.

[25]  Sebastián Ventura,et al.  Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context , 2015, Neurocomputing.

[26]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[27]  Changyin Sun,et al.  AL-ELM: One uncertainty-based active learning algorithm using extreme learning machine , 2015, Neurocomputing.

[28]  Gesellschaft für Klassifikation. Jahrestagung,et al.  From Data and Information Analysis to Knowledge Engineering, Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Magdeburg, March 9-11, 2005 , 2006, GfKl.

[29]  Tat-Seng Chua,et al.  Semantic-Gap-Oriented Active Learning for Multilabel Image Annotation , 2012, IEEE Transactions on Image Processing.

[30]  Alex Alves Freitas,et al.  A hierarchical multi-label classification ant colony algorithm for protein function prediction , 2010, Memetic Comput..

[31]  Xin Li,et al.  Active Learning with Multi-Label SVM Classification , 2013, IJCAI.

[32]  Mohan Singh,et al.  Active Learning for Multi-Label Image Annotation , 2009 .

[33]  Zhifeng Hao,et al.  A unified multi-label classification framework with supervised low-dimensional embedding , 2016, Neurocomputing.

[34]  JuiHsi Fu,et al.  Certainty-based active learning for sampling imbalanced datasets , 2013, Neurocomputing.

[35]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[36]  Pengpeng Zhao,et al.  Multi-label active learning with label correlation for image classification , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[37]  Ling Shao,et al.  Active learning for human action retrieval using query pool selection , 2014, Neurocomputing.

[38]  Lei Wang,et al.  Multilabel SVM active learning for image classification , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[39]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[40]  Hsuan-Tien Lin,et al.  Multi-label Active Learning with Auxiliary Learner , 2011, ACML.

[41]  Zhi-Hua Zhou,et al.  Multi-Label Active Learning: Query Type Matters , 2015, IJCAI.

[42]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[43]  Yang Liu,et al.  A probabilistic model of active learning with multiple noisy oracles , 2013, Neurocomputing.

[44]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Yang Wang,et al.  Multilabel Image Classification Via High-Order Label Correlation Driven Active Learning , 2014, IEEE Transactions on Image Processing.

[46]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[47]  Sebastián Ventura,et al.  Multi‐label learning: a review of the state of the art and ongoing research , 2014, WIREs Data Mining Knowl. Discov..

[48]  Chun Chen,et al.  Multi-view based multi-label propagation for image annotation , 2015, Neurocomputing.

[49]  Habib Fardoun,et al.  JCLAL: A Java Framework for Active Learning , 2016, J. Mach. Learn. Res..

[50]  Josef Kittler,et al.  Multi-label classification using stacked spectral kernel discriminant analysis , 2016, Neurocomputing.

[51]  Changsheng Xu,et al.  Multi-view multi-label active learning for image classification , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[52]  Grigorios Tsoumakas,et al.  The 9th annual MLSP competition: New methods for acoustic classification of multiple simultaneous bird species in a noisy environment , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[53]  Pablo M. Granitto,et al.  Spot defects detection in cDNA microarray images , 2011, Pattern Analysis and Applications.

[54]  Yong Man Ro,et al.  Semantic Home Photo Categorization , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[55]  Li Guo,et al.  Mining Multi-Label Data Streams Using Ensemble-Based Active Learning , 2012, SDM.

[56]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[57]  Alex Alves Freitas,et al.  A Genetic Algorithm for Optimizing the Label Ordering in Multi-label Classifier Chains , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[58]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[59]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[60]  Grigorios Tsoumakas,et al.  Protein Classification with Multiple Algorithms , 2005, Panhellenic Conference on Informatics.

[61]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.