Research on theme crawler based on Shark-Search and PageRank algorithm

In the theme crawler, the Shark-Search algorithm is insufficient to consider the global Web page. In this paper, the PageRank algorithm is used to calculate the URL's authority to make up for this shortcoming, and Shark-PageRank algorithm, which adopts the anchor text, the context near the anchor text and authoritative value of Web page to measure the value of the URL, is proposed in this paper. The experiment results show that the new algorithm improves the speed and accuracy of the query, and the algorithm has good stability and scalability.