Research and Application to Automatic Indexing

Based on the study of TF-IDF, information gain and information entropy, the paper proposes an improved method of weight calculation, which combines the TF-IDF Normalization with information gain, to extract key words Moreover, to abstract indexing words with counting semantic similarity of the key words in order to finish a process of automatic indexing Through the comparative experiment shows that the comprehensive assessment value of indexing words which are obtained by the modified method of weight calculation are higher than obtained by the traditional TF-IDF method.