Refinement of Web Communities Based on Graph Structure of Hyperlinks

Discovery of representative Web pages regarding specific topics is important for assisting users’ information retrieval from the Web. Researches on Web structure mining, whose goals are to discover or to rank important Web pages based on the graph structure of hyperlinks, have been very active recently. A complete bipartite of Web graph, which is composed of centers (containing useful information regarding specific topic) and fans (containing hyperlinks to centers), can be regarded as a Web community sharing a common interest. Although Murata’s method for discovering Web communities is a simple method for finding related Web pages, it has the following weaknesses: (1) since the number of centers increases monotonously, pages irrelevant to the members of Web communities may be added in the process of discovery, and (2) since the number of fans decreases monotonously according as the number of centers increases, the method may suffer topic drift. This paper describes an improved method for refining Web communities in order to acquire representative Web pages of the topics of input Web communities. The method is based on the assumption that most of the fans contain hyperlinks pointing to representative pages regarding their topic, and that hyperlinks to the pages of the same quality often co-occur. In our new method, both fans and centers are renewed iteratively by the result of the majority vote of the members of previous Web community. Results of our experiments show that the new method has abilities of finding desirable pages for several topics.