Automated Ontology-Driven Metasearch Generation with Metamorph

We present Metamorph, a system and framework for generating vertical deep Web search engines in a knowledge-based way. The approach enables the separation between the roles of a higher skilled ontology engineer and a less skilled service engineer, which adds new web sources in an intuitive, semi-automatic manner using the proven Lixto suite. One part of the framework is the understanding process for complex web search forms, and the generation of an ontological representation of each form and its intrinsic run-time dependencies. Based on these representations, a unified meta form and matchings from the meta form to the individual search forms and vice versa are created, taking into account different form element types, contents and labels. We discuss several aspects of the Metamorph ontology, which focuses especially on the interaction semantics of web forms, and give a short account of our semi-automatic tagging system.

[1]  Minos N. Garofalakis,et al.  MashMaker: mashups for the masses , 2007, SIGMOD '07.

[2]  Georg Gottlob,et al.  Visual Web Information Extraction with Lixto , 2001, VLDB.

[3]  Robert Baumgartner,et al.  Methoden und Werkzeuge zur Webdatenextraktion , 2006, Semantic Web: Wege zur vernetzten Wissensgesellschaft.

[4]  Frederick H. Lochovsky,et al.  Data extraction and label assignment for web databases , 2003, WWW '03.

[5]  Clement T. Yu,et al.  WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce , 2003, VLDB.

[6]  Stefan Kuhlins,et al.  Toolkits for Generating Wrappers , 2002, NetObjectDays.

[7]  Kevin Chen-Chuan Chang,et al.  Automatic complex schema matching across Web query interfaces: A correlation mining approach , 2006, TODS.

[8]  Valter Crescenzi,et al.  RoadRunner: automatic data extraction from data-intensive web sites , 2002, SIGMOD '02.

[9]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[10]  Sourav S. Bhowmick,et al.  DEQUE: querying the deep web , 2005, Data Knowl. Eng..

[11]  Clement T. Yu,et al.  Clustering e-commerce search engines , 2004, WWW Alt. '04.

[12]  Clement T. Yu,et al.  Automatic integration of Web search interfaces with WISE-Integrator , 2004, The VLDB Journal.

[13]  Valter Crescenzi,et al.  RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.

[14]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[15]  David W. Embley,et al.  Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.