Multi-label Classification: Dealing with Imbalance by Combining Labels

Data imbalance is a common problem both in single-label classification (SLC) and multi-label classification (MLC). There is no doubt that the predicting result suffers from this problem. Although, a broad range of studies associate with imbalance problem, most of them focus on SLC and for MLC is relatively less. Actually, this problem arising in MLCis more frequent and complex than in SLC. In this paper, we proceed from dealing with imbalance problem for MLC and propose a new approach called DEML. DEML transforms the whole label set of multi-label dataset into some subsets and each subset is treated as a multi-class dataset with balanced class distribution, which not only addressing imbalance problem but also preserving dataset integrity and consistency. Extensive experiments show that DEML possesses highly competitive performance both in computation and effectiveness.

[1]  A. J. Rivera,et al.  A First Approach to Deal with Imbalance in Multi-label Datasets , 2013, HAIS.

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  Ann Q. Gates,et al.  TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2005 .

[4]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[5]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[6]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[7]  ZhouZhi-Hua,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006 .

[8]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[9]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[10]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[11]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[12]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[13]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[14]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.