论文信息 - Multi-label imbalanced classification based on assessments of cost and value

Multi-label imbalanced classification based on assessments of cost and value

Multi-label imbalanced data comprise data with a disproportionate number of samples in the classes. Traditional classifiers are more suitable for classifying balanced data because the classification performance declines dramatically when the class sizes are imbalanced in multi-label data. In this study, we propose an algorithm that assesses the cost of the majority class and the value of the minority classes to handle the multi-label imbalanced data classification problem. The main idea of our algorithm is to provide a quantitative assessment of the cost of the majority class and the value of the minority class based on an imbalance ratio. In the data preprocessing step, we employ a penalty function to determine the number of majority class instances for elimination. The contributions of an instance determine whether a majority class instance is to be eliminated. In the classification step, we propose a metric to control the cost of the majority class and the value of the minority class. Experiments showed that this algorithm can improve the performance of multi-label imbalanced data classification.

[1] A. J. Rivera,et al. A First Approach to Deal with Imbalance in Multi-label Datasets , 2013, HAIS.

[2] Xu-Ying Liu,et al. Towards Class-Imbalance Aware Multi-Label Learning , 2015, IEEE Transactions on Cybernetics.

[3] Nicola Torelli,et al. Training and assessing classification rules with imbalanced data , 2012, Data Mining and Knowledge Discovery.

[4] Carlos Márquez-Vera,et al. Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data , 2013, Applied Intelligence.

[5] Min-Ling Zhang,et al. A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[6] Grigorios Tsoumakas,et al. Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[7] Ryan O'Donnell,et al. New degree bounds for polynomial threshold functions , 2010, Comb..

[8] Francisco Charte,et al. Addressing imbalance in multilabel classification: Measures and random resampling algorithms , 2015, Neurocomputing.

[9] Saso Dzeroski,et al. Decision Trees for Hierarchical Multilabel Classification: A Case Study in Functional Genomics , 2006, PKDD.

[10] Saso Dzeroski,et al. Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[11] Witold Pedrycz,et al. Multi-label classification by exploiting label correlations , 2014, Expert Syst. Appl..

[12] MenardiGiovanna,et al. Training and assessing classification rules with imbalanced data , 2014 .

[13] Geoff Holmes,et al. Classifier chains for multi-label classification , 2009, Machine Learning.

[14] Concha Bielza,et al. Decision boundary for discrete Bayesian network classifiers , 2015, J. Mach. Learn. Res..

[15] Vojislav Kecman,et al. Multi-target support vector regression via correlation regressor chains , 2017, Inf. Sci..

[16] Germán Castellanos-Domínguez,et al. Managing Imbalanced Data Sets in Multi-label Problems: A Case Study with the SMOTE Algorithm , 2013, CIARP.

[17] Concha Bielza,et al. Expressive Power of Binary Relevance and Chain Classifiers Based on Bayesian Networks for Multi-label Classification , 2014, Probabilistic Graphical Models.

[18] Eric Hsueh-Chan Lu,et al. Semantic trajectory-based high utility item recommendation system , 2014, Expert Syst. Appl..

[19] Ming Fang,et al. Multi-label Classification: Dealing with Imbalance by Combining Labels , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[20] Dong Xu,et al. Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types , 2016, Bioinform..

[21] Concha Bielza,et al. Multi-dimensional classification with Bayesian networks , 2011, Int. J. Approx. Reason..

[22] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[23] Saroj K. Biswas,et al. Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance , 2017, Pattern Recognit. Lett..

[24] Abbas Akkasi,et al. Balanced undersampling: a novel sentence-based undersampling method to improve recognition of named entities in chemical and biomedical text , 2017, Applied Intelligence.

[25] Hong Guo,et al. Neural Learning from Unbalanced Data , 2004, Applied Intelligence.

[26] Teresa Gonçalves,et al. A Preliminary Approach to the Multilabel Classification Problem of Portuguese Juridical Documents , 2003, EPIA.

[27] Concha Bielza,et al. Decision functions for chain classifiers based on Bayesian networks for multi-label classification , 2016, Int. J. Approx. Reason..

[28] Tibério S. Caetano,et al. Reverse Multi-Label Learning , 2010, NIPS.

[29] Sebastián Ventura,et al. LAIM discretization for multi-label data , 2016, Inf. Sci..

[30] Eyke Hüllermeier,et al. Label ranking by learning pairwise preferences , 2008, Artif. Intell..

[31] Liangxiao Jiang,et al. Improving Tree augmented Naive Bayes for class probability estimation , 2012, Knowl. Based Syst..

[32] Francisco Charte,et al. MLeNN: A First Approach to Heuristic Multilabel Undersampling , 2014, IDEAL.

[33] Concha Bielza,et al. Multi-label classification with Bayesian network-based chain classifiers , 2014, Pattern Recognit. Lett..

[34] Shasha Wang,et al. Deep feature weighting for naive Bayes and its application to text classification , 2016, Eng. Appl. Artif. Intell..

[35] Yan Wu,et al. On the properties of concept classes induced by multivalued Bayesian networks , 2012, Inf. Sci..

[36] Sunita Sarawagi,et al. Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[37] Zhi-Hua Zhou,et al. ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[38] Qinbao Song,et al. A dissimilarity-based imbalance data classification algorithm , 2014, Applied Intelligence.