The research of an improved information gain method using distribution information of terms

The inadequacy of the information gain is taken into account the situation that the term does not appear. But, in this paper, by analyzing the distribution information of terms, we find if the value of Distribution Information inside a Class of the term becomes large, the distribution of the term inclines to imbalance, and if there is high imbalance of the term, the Distribution Information among Classes will tend to a smaller value. Therefore, the Distribution Information inside a Class and Distribution Information among Classes are introduced to this paper to reduce the effect of the term does not appear, and improve the traditional information gain. After experimental verification, the improved algorithm (GDI) has a better performance than traditional feature selection algorithm in some fields, such as the Information Gain.