Random forest using tree selection method to classify unbalanced data

Random forest is a popular classification algorithm used to build ensemble models of decision tree classifiers. However, owing to the complexity of unbalanced data distribution in high dimensional space, a random forest may include bad trees that can result in wrong results. This paper proposed an improved random forest algorithm with tree selection methods. This algorithm is particularly designed for analyzing unbalanced data. The novel tree selection methods are developed for making random forest framework well suited to classify unbalanced data. Experimental results on unbalanced datasets with diverse characteristics have demonstrated that the proposed method could generate a random forest model with higher performance than the random forests generated by Breiman's method.