Characterizing Web pages based on the query likelihoods of neighboring pages

The World Wide Web has a massive number of Web pages, making it difficult for users to find useful information accurately. Most Web search engines have been constructed by considering Web page content and link structure. Recently, the probabilistic language model has been developed to improve retrieval accuracy of Web searches. The probabilistic language model calculates probabilities of query terms in each Web page as the page's query likelihood. This approach provides a similarity measure between a Web page and the user query. However, it does not consider the relationship or relative similarity between a Web page and its neighboring pages. Therefore it dose not characterize the Web page precisely. To address this problem, we propose methods for characterizing Web pages considering the relationship between a Web page and their neighboring pages. Our method can search Web pages more accurately than traditional Web search techniques. As a result, our method can calculate query likelihoods of Web pages, so that users can access Web pages satisfied with user's information need more accurately.