Constrained nonnegative matrix factorization-based semi-supervised multilabel learning

In many multilabel learning applications, instances with labels being fully provided are scarce, while partially labelled data and unlabelled data are more common due to the expensive cost of manual labelling. However, most of existing models are based on the assumption that the fully labelled training data is sufficient. To deal with the partially labelled and unlabelled data effectively, we present a novel semi-supervised multilabel learning approach based on constrained non-negative matrix factorization in this paper. This approach assumes that if two instances are highly similar in terms of their features, they would also be similar in their associated labels set. Specifically, We first define three matrices to measure the similarity of each pair of instances in two different ways. Then, the optimal assignation of labels to the unlabelled instance is determined by minimizing the differentiation between these two similarity sets via a non-negative matrix factorization process. We also present a threshold learning algorithm to determine the classification threshold for each label in our proposed approach. Extensive experiment is conducted on various datasets, and the results demonstrate that our method show significantly better performance than other state-of-the-art approaches. It is especially suitable for the situations with a smaller size of labelled training data, or subset of the training data are partially labelled.

[1]  Nan Chen,et al.  Constrained NMF-based semi-supervised learning for social media spammer detection , 2017, Knowl. Based Syst..

[2]  Yu-Lin He,et al.  Fuzziness based semi-supervised learning approach for intrusion detection system , 2017, Inf. Sci..

[3]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[4]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[5]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[6]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[7]  Xizhao Wang,et al.  A cost-sensitive semi-supervised learning model based on uncertainty , 2017, Neurocomputing.

[8]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[9]  Xuelong Li,et al.  Constrained Nonnegative Matrix Factorization for Image Representation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[11]  Guandong Xu,et al.  Leveraging Supervised Label Dependency Propagation for Multi-label Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[12]  Xinbo Gao,et al.  Semi-Supervised Nonnegative Matrix Factorization via Constraint Propagation , 2016, IEEE Transactions on Cybernetics.

[13]  Stefan Kramer,et al.  Multi-label classification using boolean matrix decomposition , 2012, SAC '12.

[14]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[15]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[16]  Ian Davidson,et al.  Semi-Supervised Dimension Reduction for Multi-Label Classification , 2010, AAAI.

[17]  Sam Kwong,et al.  Incorporating Diversity and Informativeness in Multiple-Instance Active Learning , 2017, IEEE Transactions on Fuzzy Systems.

[18]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[19]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[20]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[21]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[22]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[23]  Hsuan-Tien Lin,et al.  Multilabel Classification with Principal Label Space Transformation , 2012, Neural Computation.

[24]  Yuhong Guo,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Multi-Label Classification Using Conditional Dependency Networks , 2022 .

[25]  Chih-Jen Lin,et al.  A Study on Threshold Selection for Multi-label Classification , 2007 .

[26]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[27]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[28]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[29]  Dong Zhou,et al.  Label consistent semi-supervised non-negative matrix factorization for maintenance activities identification , 2016, Eng. Appl. Artif. Intell..

[30]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.