MLRF: Multi-label Classification Through Random Forest with Label-Set Partition

Although random forest is one of the best ensemble learning algorithms for single-label classification, exploiting it for multi-label classification problems is still challenging and few method has been investigated in the literature. This paper proposes MLRF, a multi-label classification method based on a variation of random forest. In this algorithm, a new label set partition method is proposed to transform multi-label data sets into multiple single-label data sets, which can effectively discover correlated labels to optimize the label subset partition. For each generated single-label subset, a random forest classifier is learned by an improved random forest algorithm that employs a kNN-like on-line instance sampling method. Experimental results on ten benchmark data sets have demonstrated that MLRF outperforms other state-of-the-art multi-label classification algorithms in terms of classification performance as well as various evaluation criteria widely used for multi-label classification.

[1]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[2]  Karl L. Wuensch,et al.  Chi-Square Tests , 2011, International Encyclopedia of Statistical Science.

[3]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[4]  Peter Sprent,et al.  Fisher Exact Test , 2011, International Encyclopedia of Statistical Science.

[5]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[6]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[7]  Dragi Kocev,et al.  Ensembles for Predicting Structured Outputs , 2012, Informatica.

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[10]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[11]  O. Gascuel,et al.  Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. , 2006, Systematic biology.

[12]  Eyke Hüllermeier,et al.  Dependent binary relevance models for multi-label classification , 2014, Pattern Recognit..

[13]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[14]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[15]  Yunming Ye,et al.  Stratified sampling for feature subspace selection in random forests for high dimensional data , 2013, Pattern Recognit..

[16]  Zhi-Hua Zhou,et al.  A k-nearest neighbor based algorithm for multi-label classification , 2005, 2005 IEEE International Conference on Granular Computing.

[17]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[18]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[20]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.