Clustering e-commerce search engines based on their search interface pages using WISE-Cluster

In this paper, we propose a new approach to clustering e-commerce search engines (ESEs) on the Web. Our approach utilizes the features available on the interface page of each ESE, including the label terms and value terms appearing in the search form, the number of images, normalized price terms as well as other terms. The experimental results based on more than 400 ESEs indicate that the proposed approach has good clustering accuracy. The importance of different types of features is analyzed and the terms in the search form are the most important feature in obtaining quality clusters.

[1]  Oren Etzioni,et al.  Learning to Understand Information on the Internet: An Example-Based Approach , 1997, Journal of Intelligent Information Systems.

[2]  Tao Tao,et al.  Organizing structured web sources by query schemas: a clustering approach , 2004, CIKM '04.

[3]  Clement T. Yu,et al.  WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce , 2003, VLDB.

[4]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Kevin Chen-Chuan Chang,et al.  Statistical schema matching across web query interfaces , 2003, SIGMOD '03.

[7]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[8]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[9]  Clement T. Yu,et al.  Constructing Interface Schemas for Search Interfaces of Web Databases , 2005, WISE.

[10]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[11]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[12]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[13]  Oren Etzioni,et al.  A scalable comparison-shopping agent for the World-Wide Web , 1997, AGENTS '97.

[14]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[15]  Yiming Yang,et al.  A study of thresholding strategies for text categorization , 2001, SIGIR '01.

[16]  Luis Gravano,et al.  Probe, count, and classify: categorizing hidden web databases , 2001, SIGMOD '01.

[17]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[18]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19]  Clement T. Yu,et al.  An interactive clustering-based approach to integrating source query interfaces on the deep Web , 2004, SIGMOD '04.

[20]  Clement T. Yu,et al.  Concept Hierarchy-Based Text Database Categorization , 2002, Knowledge and Information Systems.

[21]  David Hawking,et al.  Automated Discovery of Search Interfaces on the Web , 2003, ADC.

[22]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.