论文信息 - MLRF: Multi-label Classification Through Random Forest with Label-Set Partition

MLRF: Multi-label Classification Through Random Forest with Label-Set Partition

Although random forest is one of the best ensemble learning algorithms for single-label classification, exploiting it for multi-label classification problems is still challenging and few method has been investigated in the literature. This paper proposes MLRF, a multi-label classification method based on a variation of random forest. In this algorithm, a new label set partition method is proposed to transform multi-label data sets into multiple single-label data sets, which can effectively discover correlated labels to optimize the label subset partition. For each generated single-label subset, a random forest classifier is learned by an improved random forest algorithm that employs a kNN-like on-line instance sampling method. Experimental results on ten benchmark data sets have demonstrated that MLRF outperforms other state-of-the-art multi-label classification algorithms in terms of classification performance as well as various evaluation criteria widely used for multi-label classification.

[1] Grigorios Tsoumakas,et al. An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[2] Karl L. Wuensch,et al. Chi-Square Tests , 2011, International Encyclopedia of Statistical Science.

[3] J. Fleiss. Statistical methods for rates and proportions , 1974 .

[4] Peter Sprent,et al. Fisher Exact Test , 2011, International Encyclopedia of Statistical Science.

[5] Grigorios Tsoumakas,et al. Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[6] Geoff Holmes,et al. Classifier chains for multi-label classification , 2009, Machine Learning.

[7] Dragi Kocev,et al. Ensembles for Predicting Structured Outputs , 2012, Informatica.

[8] Leo Breiman,et al. Random Forests , 2001, Machine Learning.

[9] Saso Dzeroski,et al. Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[10] Saso Dzeroski,et al. An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[11] O. Gascuel,et al. Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. , 2006, Systematic biology.

[12] Eyke Hüllermeier,et al. Dependent binary relevance models for multi-label classification , 2014, Pattern Recognit..

[13] J. Fleiss,et al. Statistical methods for rates and proportions , 1973 .

[14] Grigorios Tsoumakas,et al. Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[15] Yunming Ye,et al. Stratified sampling for feature subspace selection in random forests for high dimensional data , 2013, Pattern Recognit..

[16] Zhi-Hua Zhou,et al. A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[17] Yoram Singer,et al. BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[18] Geoff Holmes,et al. Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19] Kilian Stoffel,et al. Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[20] Luc De Raedt,et al. Top-Down Induction of Clustering Trees , 1998, ICML.