Improved context graph algorithm by using feature selection based on word frequency differentia

In order to solve the low efficiency problem of traditional focused crawler, the heuristic web crawler search algorithm Context Graph is analyzed. However, Context Graph method is deficient. An optimization strategy is proposed by adopting the improved TF-IDF and feature selection method based on word frequency differentia, which takes importance of different web textual content into consideration synthetically. A new method of term weighting is explicated in text categorization which considers feature words among and inside class. Compared with the other given algorithms,experimental results indicate that this strategy is more efficient in crawling the topic pages.