CWS: a comparative web search system

In this paper, we define and study a novel search problem: Comparative Web Search (CWS). The task of CWS is to seek relevant and comparative information from the Web to help users conduct comparisons among a set of topics. A system called CWS is developed to effectively facilitate Web users' comparison needs. Given a set of queries, which represent the topics that a user wants to compare, the system is characterized by: (1) automatic retrieval and ranking of Web pages by incorporating both their relevance to the queries and the comparative contents they contain; (2) automatic clustering of the comparative contents into semantically meaningful themes; (3) extraction of representative keyphrases to summarize the commonness and differences of the comparative contents in each theme. We developed a novel interface which supports two types of view modes: a pair-view which displays the result in the page level, and a cluster-view which organizes the comparative pages into the themes and displays the extracted phrases to facilitate users' comparison. Experiment results show the CWS system is effective and efficient.

[1]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[2]  Mo Chen,et al.  A practical system of keyphrase extraction for web pages , 2005, CIKM '05.

[3]  Bing Liu,et al.  Visualizing web site comparisons , 2002, WWW '02.

[4]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[5]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[6]  Monika Henzinger,et al.  Query-Free News Search , 2003, WWW '03.

[7]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[8]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[9]  Katsumi Tanaka,et al.  A comparative web browser (CWB) for browsing and comparing web pages , 2003, WWW '03.

[10]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[11]  Hongyuan Zha,et al.  Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering , 2002, SIGIR '02.

[12]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[13]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[14]  Philip S. Yu,et al.  Discovering unexpected information from your competitors' web sites , 2001, KDD '01.

[15]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[16]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[17]  ChengXiang Zhai,et al.  CTMS : A Comparative Text Mining System , 2005 .

[18]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[19]  Tao Tao,et al.  Mining comparable bilingual text corpora for cross-language information integration , 2005, KDD '05.