Visualizing web site comparisons

The Web is increasingly becoming an important channel for conducting businesses, disseminating information, and communicating with people on a global scale. More and more companies, organizations, and individuals are publishing their information on the Web. With all this information publicly available, naturally companies and individuals want to find useful information from these Web pages. As an example, companies always want to know what their competitors are doing and what products and services they are offering. Knowing such information, the companies can learn from their competitors and/or design countermeasures to improve their own competitiveness. The ability to effectively find such business intelligence information is increasingly becoming crucial to the survival and growth of any company. Despite its importance, little work has been done in this area. In this paper, we propose a novel visualization technique to help the user find useful information from his/her competitors' Web site easily and quickly. It involves visualizing (with the help of a clustering system) the comparison of the user's Web site and the competitor's Web site to find similarities and differences between the sites. The visualization is such that with a single glance, the user is able to see the key similarities and differences of the two sites. He/she can then quickly focus on those interesting clusters and pages to browse the details. Experiment results and practical applications show that the technique is effective.

[1]  Yoelle Maarek,et al.  The Shark-Search Algorithm. An Application: Tailored Web Site Mapping , 1998, Comput. Networks.

[2]  I. V. Ramakrishnan,et al.  A layered architecture for querying dynamic Web content , 1999, SIGMOD '99.

[3]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[4]  James Allan,et al.  Interactive Cluster Visualization for Information Retrieval , 1997 .

[5]  Marc Najork,et al.  Focus+Context Display of Web Pages: Implementation Alternatives , 1997 .

[6]  Wen-Syan Li,et al.  Facilitating Complex Web Queries Through Visual User Interfaces and Query Relaxation , 1998, Comput. Networks.

[7]  Jeffrey D. Ullman,et al.  A Query Translation Scheme for Rapid Implementation of Wrappers , 1995, DOOD.

[8]  Marc Najork,et al.  Breadth-First Search Crawling Yields High-Quality Pages , 2001 .

[9]  Carolyn J. Crouch,et al.  The use of cluster hierarchies in hypertext information retrieval , 1989, Hypertext.

[10]  Gregory Piatetsky-Shapiro,et al.  The interestingness of deviations , 1994 .

[11]  Frank Kriwaczek,et al.  A Visualization Interface for Document Searching and Browsing , 2000 .

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[14]  A KnoblockCraig,et al.  Wrapper generation for semi-structured Internet sources , 1997 .

[15]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1997, International Journal on Digital Libraries.

[16]  Marc M. Sebrechts,et al.  Visualization of search results: a comparative evaluation of text, 2D, and 3D interfaces , 1999, SIGIR '99.

[17]  Tamara Munzner,et al.  Visualizing the structure of the World Wide Web in 3D hyperbolic space , 1995, VRML '95.

[18]  Ophir Frieder,et al.  Clustering and Classification of Large Document Bases in a Parallel Environment , 1997, J. Am. Soc. Inf. Sci..

[19]  Balaji Padmanabhan,et al.  Small is beautiful: discovering the minimal set of unexpected patterns , 2000, KDD '00.

[20]  Yongjian Fu,et al.  A Generalization-Based Approach to Clustering of Web Usage Sessions , 1999, WEBKDD.

[21]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[22]  James A. Landay,et al.  WebQuilt: a framework for capturing and visualizing the web experience , 2001, WWW '01.

[23]  Alberto O. Mendelzon,et al.  Visual web surfing with Hy+ , 1995, CASCON.

[24]  Paul P. Maglio,et al.  User-Centered Push for Timely Information Delivery , 1998, Comput. Networks.

[25]  Craig A. Knoblock,et al.  Wrapper generation for semi-structured Internet sources , 1997, SGMD.

[26]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[27]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[28]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[29]  Fred Douglis,et al.  The AT&T Internet Difference Engine: Tracking and viewing changes on the web , 1998, World Wide Web.

[30]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[31]  Yih-Farn Robin Chen,et al.  WebCiao: A Website Visualization and Tracking System , 1997, WebNet.

[32]  Marc Najork,et al.  Breadth-first crawling yields high-quality pages , 2001, WWW '01.

[33]  Philip S. Yu,et al.  Discovering unexpected information from your competitors' web sites , 2001, KDD '01.

[34]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[35]  PatternsYongjian,et al.  Clustering of Web Users Based on Access , 1999 .

[36]  Padhraic Smyth,et al.  Visualization of navigation patterns on a Web site using model-based clustering , 2000, KDD '00.