Extract and rank web communities

A web community is a pattern in the WWW which is understood as a set of related web pages. In this paper, we propose an efficient algorithm to find the web communities on a given specific topic. Instead of working on the whole web graph, we work on a web domain, which we extract based on the topic specific search results. Therefore, the resulted communities are highly related with the search topic. The ranking of a community denotes the degree of relevance between the search query and the extracted communities. We introduce an approach for ranking the extracted communities based on their dense bipartite pattern. Ranking significantly improves the relevance of the extracted communities with the search topic.

[1]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[2]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[3]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[4]  Eli Upfal,et al.  The Web as a graph , 2000, PODS.

[5]  Marco Pellegrini,et al.  Extraction and classification of dense communities in the web , 2007, WWW '07.

[6]  Stan Sclaroff World Wide Web Image Search Engines , 1995 .

[7]  David M. Pennock,et al.  Using web structure for classifying and describing web pages , 2002, WWW.

[8]  Dorit S. Hochbaum,et al.  Approximating Clique and Biclique Problems , 1998, J. Algorithms.

[9]  C. Lee Giles,et al.  Self-Organization and Identification of Web Communities , 2002, Computer.

[10]  Gregory Buehrer,et al.  A scalable pattern mining approach to web graph compression with communities , 2008, WSDM '08.

[11]  Karsten Verbeurgt Inferring Emergent Web Communities , 2003 .

[12]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[13]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[14]  Masaru Kitsuregawa,et al.  An approach to relate the Web communities through bipartite graphs , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[15]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[16]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[17]  Giuseppe Attardi,et al.  Automatic Web Page Categorization by Link and Context Analysis , 1999 .

[18]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[19]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[20]  Ricardo A. Baeza-Yates,et al.  Web page ranking using link attributes , 2004, WWW Alt. '04.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.