Cost-Sensitive Reference Pair Encoding for Multi-Label Learning

Label space expansion for multi-label classification (MLC) is a methodology that encodes the original label vectors to higher dimensional codes before training and decodes the predicted codes back to the label vectors during testing. The methodology has been demonstrated to improve the performance of MLC algorithms when coupled with off-the-shelf error-correcting codes for encoding and decoding. Nevertheless, such a coding scheme can be complicated to implement, and cannot easily satisfy a common application need of cost-sensitive MLC---adapting to different evaluation criteria of interest. In this work, we show that a simpler coding scheme based on the concept of a reference pair of label vectors achieves cost-sensitivity more naturally. In particular, our proposed cost-sensitive reference pair encoding (CSRPE) algorithm contains cluster-based encoding, weight-based training and voting-based decoding steps, all utilizing the cost information. Furthermore, we leverage the cost information embedded in the code space of CSRPE to propose a novel active learning algorithm for cost-sensitive MLC. Extensive experimental results verify that CSRPE performs better than state-of-the-art algorithms across different MLC criteria. The results also demonstrate that the CSRPE-backed active learning algorithm is superior to existing algorithms for active MLC, and further justify the usefulness of CSRPE.

[1]  Hsuan-Tien Lin,et al.  Multilabel Classification Using Error-Correcting Codes of Hard or Soft Bits , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Hsuan-Tien Lin,et al.  Cost-sensitive label embedding for multi-label classification , 2017, Machine Learning.

[3]  Rong Jin,et al.  Active Learning by Querying Informative and Representative Examples , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[5]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[6]  Ying Liu,et al.  Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification , 2004, J. Chem. Inf. Model..

[7]  Zheng Chen,et al.  Effective multi-label active learning for text classification , 2009, KDD.

[8]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[9]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[10]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[11]  Chun-Liang Li,et al.  Condensed Filter Tree for Cost-Sensitive Multi-Label Classification , 2014, ICML.

[12]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[13]  John Langford,et al.  Error-Correcting Tournaments , 2009, ALT.

[14]  Jason Weston,et al.  Kernel methods for Multi-labelled classification and Categ orical regression problems , 2001, NIPS 2001.

[15]  Xin Li,et al.  Active Learning with Multi-Label SVM Classification , 2013, IJCAI.

[16]  Zhi-Hua Zhou,et al.  Active Query Driven by Uncertainty and Diversity for Incremental Multi-label Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[17]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[18]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[19]  Zhi-Hua Zhou,et al.  Multi-Label Active Learning: Query Type Matters , 2015, IJCAI.

[20]  Chih-Wei Chang,et al.  Cost-Sensitive Random Pair Encoding for Multi-Label Classification , 2016, ArXiv.

[21]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[22]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[23]  Klaus Brinker,et al.  On Active Learning in Multi-label Classification , 2005, GfKl.

[24]  Andrew W. Moore,et al.  New Algorithms for Efficient High-Dimensional Nonparametric Classification , 2006, J. Mach. Learn. Res..

[25]  Grigorios Tsoumakas,et al.  Multilabel Text Classification for Automated Tag Suggestion , 2008 .

[26]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[27]  Hsuan-Tien Lin,et al.  libact: Pool-based Active Learning in Python , 2017, ArXiv.

[28]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[29]  Hsuan-Tien Lin Reduction from Cost-Sensitive Multiclass Classification to One-versus-One Binary Classification , 2014, ACML.

[30]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..