Towards automatic incorporation of search engines into a large-scale metasearch engine

A metasearch engine supports unified access to multiple component search engines. To build a very large-scale metasearch engine that can access up to hundreds of thousands of component search engines, one major challenge is to incorporate large numbers of autonomous search engines in a highly effective manner. To solve this problem, we propose automatic search engine discovery, automatic search engine connection, and automatic search engine result extraction techniques. Experiments indicate that these techniques are highly effective and efficient.

[1]  Martin Bergman,et al.  The deep web:surfacing the hidden value , 2000 .

[2]  Erich J. Neuhold,et al.  Jedi: extracting and synthesizing information from the Web , 1998, Proceedings. 3rd IFCIS International Conference on Cooperative Information Systems (Cat. No.98EX122).

[3]  Calton Pu,et al.  XWRAP: an XML-enabled wrapper construction system for Web information sources , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[4]  Clement T. Yu,et al.  A highly scalable and effective method for metasearch , 2001, TOIS.

[5]  Valter Crescenzi,et al.  RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.

[6]  Paolo Atzeni,et al.  Cut and Paste , 1999, J. Comput. Syst. Sci..

[7]  Paolo Merialdo,et al.  The Araneus Web-based management system , 1998, SIGMOD '98.

[8]  Oren Etzioni,et al.  A scalable comparison-shopping agent for the World-Wide Web , 1997, AGENTS '97.

[9]  Clement T. Yu,et al.  Towards a highly-scalable and effective metasearch engine , 2001, WWW '01.

[10]  Arnaud Sahuguet,et al.  Web Ecology: Recycling HTML Pages as XML Documents Using W4F , 1999, WebDB.

[11]  Paolo Atzeni,et al.  Cut and paste , 1997, PODS '97.

[12]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.