Multi-label imbalanced classification based on assessments of cost and value

Multi-label imbalanced data comprise data with a disproportionate number of samples in the classes. Traditional classifiers are more suitable for classifying balanced data because the classification performance declines dramatically when the class sizes are imbalanced in multi-label data. In this study, we propose an algorithm that assesses the cost of the majority class and the value of the minority classes to handle the multi-label imbalanced data classification problem. The main idea of our algorithm is to provide a quantitative assessment of the cost of the majority class and the value of the minority class based on an imbalance ratio. In the data preprocessing step, we employ a penalty function to determine the number of majority class instances for elimination. The contributions of an instance determine whether a majority class instance is to be eliminated. In the classification step, we propose a metric to control the cost of the majority class and the value of the minority class. Experiments showed that this algorithm can improve the performance of multi-label imbalanced data classification.

[1]  A. J. Rivera,et al.  A First Approach to Deal with Imbalance in Multi-label Datasets , 2013, HAIS.

[2]  Xu-Ying Liu,et al.  Towards Class-Imbalance Aware Multi-Label Learning , 2015, IEEE Transactions on Cybernetics.

[3]  Nicola Torelli,et al.  Training and assessing classification rules with imbalanced data , 2012, Data Mining and Knowledge Discovery.

[4]  Carlos Márquez-Vera,et al.  Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data , 2013, Applied Intelligence.

[5]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[7]  Ryan O'Donnell,et al.  New degree bounds for polynomial threshold functions , 2010, Comb..

[8]  Francisco Charte,et al.  Addressing imbalance in multilabel classification: Measures and random resampling algorithms , 2015, Neurocomputing.

[9]  Saso Dzeroski,et al.  Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics , 2006, PKDD.

[10]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[11]  Witold Pedrycz,et al.  Multi-label classification by exploiting label correlations , 2014, Expert Syst. Appl..

[12]  MenardiGiovanna,et al.  Training and assessing classification rules with imbalanced data , 2014 .

[13]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[14]  Concha Bielza,et al.  Decision boundary for discrete Bayesian network classifiers , 2015, J. Mach. Learn. Res..

[15]  Vojislav Kecman,et al.  Multi-target support vector regression via correlation regressor chains , 2017, Inf. Sci..

[16]  Germán Castellanos-Domínguez,et al.  Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm , 2013, CIARP.

[17]  Concha Bielza,et al.  Expressive Power of Binary Relevance and Chain Classifiers Based on Bayesian Networks for Multi-label Classification , 2014, Probabilistic Graphical Models.

[18]  Eric Hsueh-Chan Lu,et al.  Semantic trajectory-based high utility item recommendation system , 2014, Expert Syst. Appl..

[19]  Ming Fang,et al.  Multi-label Classification: Dealing with Imbalance by Combining Labels , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[20]  Dong Xu,et al.  Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types , 2016, Bioinform..

[21]  Concha Bielza,et al.  Multi-dimensional classification with Bayesian networks , 2011, Int. J. Approx. Reason..

[22]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[23]  Saroj K. Biswas,et al.  Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance , 2017, Pattern Recognit. Lett..

[24]  Abbas Akkasi,et al.  Balanced undersampling: a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text , 2017, Applied Intelligence.

[25]  Hong Guo,et al.  Neural Learning from Unbalanced Data , 2004, Applied Intelligence.

[26]  Teresa Gonçalves,et al.  A Preliminary Approach to the Multilabel Classification Problem of Portuguese Juridical Documents , 2003, EPIA.

[27]  Concha Bielza,et al.  Decision functions for chain classifiers based on Bayesian networks for multi-label classification , 2016, Int. J. Approx. Reason..

[28]  Tibério S. Caetano,et al.  Reverse Multi-Label Learning , 2010, NIPS.

[29]  Sebastián Ventura,et al.  LAIM discretization for multi-label data , 2016, Inf. Sci..

[30]  Eyke Hüllermeier,et al.  Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[31]  Liangxiao Jiang,et al.  Improving Tree augmented Naive Bayes for class probability estimation , 2012, Knowl. Based Syst..

[32]  Francisco Charte,et al.  MLeNN: A First Approach to Heuristic Multilabel Undersampling , 2014, IDEAL.

[33]  Concha Bielza,et al.  Multi-label classification with Bayesian network-based chain classifiers , 2014, Pattern Recognit. Lett..

[34]  Shasha Wang,et al.  Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[35]  Yan Wu,et al.  On the properties of concept classes induced by multivalued Bayesian networks , 2012, Inf. Sci..

[36]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[37]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[38]  Qinbao Song,et al.  A dissimilarity-based imbalance data classification algorithm , 2014, Applied Intelligence.