An Improved Text Retrieval Algorithm Based on Suffix Tree Similarity Measure

In information retrieval area, popular methods considered word frequency of retrieval terms and text corpus. These methods ignored the word sequence information between retrieval terms and text corpus, and then the good result limited to some special domains. This paper analyzes the word sequence information, and then computes the similarity between the query and text documents of corpus by applying a suffix tree similarity that combines with TF-IDF weighting method. Experimental results on standard document benchmark corpus RUTERS indicate that the new retrieval algorithm is an effective text retrieval algorithm. Comparing with the results of traditional word term weight TF-IDF similarity measure in the same retrieval algorithm, proposed method achieves an improvement of about 20% on the average of precision score.