Optimized TF-IDF Algorithm with the Adaptive Weight of Position of Word

The classical TF-IDF algorithm only considers the weight of the term frequency and the inverse document frequency, without considering the weights of other feature of word. After the author analyzing summary of Chinese expression habits, an adaptive weight of position of word algorithm based on TF-IDF is proposed in this paper, which can be called TF-IDFAP algorithm. The TF-IDF-AP algorithm can dynamically determine the weight of position of word according to the position of word. This paper introduced the vector space model (VSM) and designed comparative experiment under the scene of Chinese document clustering. The results show that the F-measure of TFIDF-AP algorithm has been improved by 12.9% comparing with the classical TF-IDF algorithm. Keywords-text feature extraction; adaptive weight; weight of position; Term Frequency-Inverse Document Frequency(TF-IDF)