Personalized spiders for web search and analysis

Searching for useful information on the World Wide Web has become incr easingly difficult. While Internet search engines have been helping people to search on the web, low recall rate and outdated indexes have become more and more problematic as the web grows. In addition, search tools usually present to the user only a list of search results, failing to provide further personalized analysis which could help users identify useful information and comprehend these results. To alleviate these problems, we propose a client-based architecture that incorporates noun phrasing and self-organizing map techniques. Two systems, namely CI Spider and Meta Spider, have been built based on this architecture. User evaluation studies have been conducted and the findings suggest that the proposed architecture can effectively facilitate web search and analysis.

[1]  Ben Shneiderman,et al.  Visualizing digital library search results with categorical and hierarchical axes , 2000, DL '00.

[2]  Gary Marchionini,et al.  A self-organizing semantic map for information retrieval , 1991, SIGIR '91.

[3]  Pattie Maes,et al.  Agents that reduce work and information overload , 1994, CACM.

[4]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[5]  M. Chignell,et al.  Discriminating MetaSearch : A Framework for Evaluation , 1999 .

[6]  Edward A. Fox,et al.  Users, User Interfaces, and Objects: Envision, a Digital Library , 1993, J. Am. Soc. Inf. Sci..

[7]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[8]  Jacek Gwizdka,et al.  Discriminating Meta-Search: A Framework for Evaluation , 1999, Inf. Process. Manag..

[9]  Edward A. Fox,et al.  Users, user interfaces, and objects: Envision, a digital library , 1993 .

[10]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[11]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[12]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[13]  Teuvo Kohonen,et al.  Exploration of very large databases by self-organizing maps , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[14]  Peter B. Danzig,et al.  Scalable Internet resource discovery: research problems and approaches , 1994, CACM.

[15]  Nicholas J. Belkin,et al.  Evaluation of a tool for visualization of information retrieval results , 1996, SIGIR '96.

[16]  Marti A. Hearst TileBars: visualization of term distribution information in full text information access , 1995, CHI '95.

[17]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[18]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[19]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[20]  Hsinchun Chen,et al.  Comparing noun phrasing techniques for use with medical digital library tools , 2000, J. Am. Soc. Inf. Sci..

[21]  Vincent Kanade,et al.  Clustering Algorithms , 2021, Wireless RF Energy Transfer in the Massive IoT Era.

[22]  B DanzigPeter,et al.  Scalable Internet resource discovery , 1994 .

[23]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[24]  Hsinchun Chen,et al.  Internet Categorization and Search: A Self-Organizing Approach , 1996, J. Vis. Commun. Image Represent..

[25]  Oliver A. McBryan,et al.  GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.

[26]  Jay F. Nunamaker,et al.  Verifying the Proximity and Size Hypothesis for Self-Organizing Maps , 2000, J. Manag. Inf. Syst..