Research on Text Feature Selection Algorithm Based on Information Gain and Feature Relation Tree

The classification performance of previous IG algorithm may decline obviously because of the maldistribution of classes and features, due to which an improved text feature selection method UDsIG is proposed. First, we select features by classes to reduce the impact on feature selection when the classes are unevenly distributed. After that, we use feature equilibrium of distribution to decrease the interference with feature selection when features are unevenly distributed. And then we deal with class features by feature relation tree model, thus to retain strong correlation features. Finally, we use the improved information gain formula, which is based on weighed dispersion, to get the optimal feature subset. The experimental results show the proposed method has better classification performance.