Methods to Edit Multi-label Training Sets Using Rough Sets Theory

In multi-label classification problems, instances can be associated with several decision classes (labels) simultaneously. One of the most successful algorithms to deal with this kind of problem is the ML-kNN method, which is lazy learner adapted to the multi-label scenario. All the computational models that realize inferences from examples have the common problem of the selection of those examples that should be included into the training set to increase the algorithm’s efficiency. This problem in known as training sets edition. Despite the extensive work in multi-label classification, there is a lack of methods for editing multi-label training sets. In this research, we propose three reduction techniques for editing multi-label training sets that rely on the Rough Set Theory. The simulations show that these methods reduce the number of examples in the training sets without affecting the overall performance, while in some case the performance is even improved.

[1]  Juan Ramón Rico-Juan,et al.  Improving kNN multi-label classification in Prototype Selection scenarios using class proposals , 2015, Pattern Recognit..

[2]  Thierry Denoeux,et al.  Editing training data for multi-label classification with the k-nearest neighbor rule , 2016, Pattern Analysis and Applications.

[3]  Lotfi A. Zadeh Key Roles of Information Granulation and Fuzzy Logic in Human Reasoning, Concept Formulation and Computing with Words , 1996, Proceedings of IEEE 5th International Fuzzy Systems.

[4]  Yiyu Yao,et al.  Three-way decisions with probabilistic rough sets , 2010, Inf. Sci..

[5]  Bianca Zadrozny,et al.  Correlation analysis of performance measures for multi-label classification , 2018, Inf. Process. Manag..

[6]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[7]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[8]  Witold Pedrycz,et al.  Granular Computing: At the Junction of Rough Sets and Fuzzy Sets , 2008 .

[9]  Thierry Denoeux,et al.  Purifying training data to improve performance of multi-label classification algorithms , 2012, 2012 15th International Conference on Information Fusion.

[10]  Vladik Kreinovich,et al.  Handbook of Granular Computing , 2008 .

[11]  Taghi M. Khoshgoftaar,et al.  Knowledge discovery from imbalanced and noisy data , 2009, Data Knowl. Eng..

[12]  Yongzhao Zhan,et al.  Complex video event detection via pairwise fusion of trajectory and multi-label hypergraphs , 2016, Multimedia Tools and Applications.

[13]  Donghai Guan,et al.  Nearest neighbor editing aided by unlabeled data , 2009, Inf. Sci..

[14]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[15]  Zhi-Hua Zhou,et al.  Editing Training Data for kNN Classifiers with Neural Network Ensemble , 2004, ISNN.

[16]  Francisco Charte,et al.  Multilabel Classification: Problem Analysis, Metrics and Techniques , 2016 .

[17]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[18]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[19]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[20]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[21]  ChengXiang Zhai,et al.  Multi-label literature classification based on the Gene Ontology graph , 2008, BMC Bioinformatics.

[22]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[23]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[24]  Rafael Bello,et al.  Rough sets in the Soft Computing environment , 2012, Inf. Sci..

[25]  Daniel Vanderpooten,et al.  A Generalized Definition of Rough Approximations Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[26]  Francisco Herrera,et al.  Prototype Selection for Nearest Neighbor Classification: Taxonomy and Empirical Study , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Y. Yao Information granulation and rough set approximation , 2001 .

[28]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[29]  Rafael Bello,et al.  A Method to Edit Training Set Based on Rough Sets , 2007 .

[30]  Francisco Charte,et al.  R Ultimate Multilabel Dataset Repository , 2016, HAIS.

[31]  Jiye Liang,et al.  Inclusion degree: a perspective on measures for rough set data analysis , 2002, Inf. Sci..

[32]  Francisco Charte,et al.  On the Impact of Dataset Complexity and Sampling Strategy in Multilabel Classifiers Performance , 2016, HAIS.

[33]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[34]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.