C4-2: Combining Link and Contents in Clustering Web Search Results to Improve Information Interpretation

With information proliferate on the web, it is far beyond human’s ability to digest this huge, heterogeneous information, e.g. locating related resources as well as providing accordingly information interpretation. While web search engine could retrieve information on the Web for a specific topic, users have to step a long ordered list in order to locate the needed information, which is often tedious and frustrating. In this paper, we investigate how to combine link and contents analysis in clustering web search results to improve information interpretation for a specific topic. By filtering some irrelevant pages, the proposed approach clusters high quality pages in web search results into semantically meaningful groups with additional tagging keywords to facilitate users accessing and understanding. We especially study the contribution of link and contents to clustering procedure. Preliminary experiments and evaluations are conducted to investigate its effectiveness.

[1]  A. F. Adams,et al.  The Survey , 2021, Dyslexia in Higher Education.

[2]  Yitong Wang,et al.  Use link-based clustering to improve Web search results , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[3]  Peter Pirolli,et al.  Life, death, and lawfulness on the electronic frontier , 1997, CHI.

[4]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[5]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[6]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[7]  M. M. Kessler Bibliographic coupling between scientific papers , 1963 .

[8]  Alberto O. Mendelzon,et al.  Database techniques for the World-Wide Web: a survey , 1998, SGMD.

[9]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[10]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[11]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Chanathip Namprempre,et al.  HyPursuit: a hierarchical network search engine that exploits content-link hypertext clustering , 1996, HYPERTEXT '96.

[14]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  Anupam Joshi,et al.  Retriever: Improving Web Search Engine Results Using Clustering , 2000 .

[17]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[18]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[19]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[20]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.