On the Design and Exploitation of Presentation Ontologies for Information Extraction

The structure of ontologies that are considered as input to information extraction is mostly rather simple. In this paper we report on our ongoing effort of using rich ontologies with numerous constraints over the information to be extracted. Important aspects of the approach are the coupling of user-defined ontologies with other sources of knowledge such as training data and document formatting structures, and the distinction between proper domain ontologies and so-called presentation ontologies, where the latter (as ‘pragmatic bridges’ over the ‘semantic gap’) can partially be derived from the former. The extraction tool under construction builds on experience from an ongoing application in the domain of product catalogue analysis, and attempts to offer high flexibility with respect to availability of various input information sources.

[1]  Pavel Praks,et al.  Web Image Classification for Information Extraction , 2005 .

[2]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[3]  Nicholas Kushmerick,et al.  Wrapper Induction for Information Extraction , 1997, IJCAI.

[4]  Steffen Staab,et al.  S-CREAM: Semiautomatic CREAtion of Metadata , 2002, SAAKM@ECAI.

[5]  Martin Kavalec,et al.  Information Extraction and Ontology Learning Guided by Web Directory , 2002 .

[6]  Georg Gottlob,et al.  Logic-based web information extraction , 2004, SGMD.

[7]  Pavel Praks,et al.  Information extraction from HTML product catalogues: from source code and images to RDF , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[8]  Alexiei Dingli,et al.  Learning to Harvest Information for the Semantic Web , 2004, ESWS.

[9]  Martin Labský,et al.  RDF-Based Retrieval of Information Extracted from Web Product Catalogues , 2004 .

[10]  Maria Teresa Pazienza,et al.  Combining Ontological Knowledge and Wrapper Induction techniques into an e-retail System , 2003 .

[11]  Bob J. Wielinga,et al.  Using explicit ontologies in KBS development , 1997, Int. J. Hum. Comput. Stud..

[12]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[13]  Martin Labský,et al.  Types and Roles of Ontologies in Web Information Extraction , 2004 .

[14]  Cui Tao,et al.  Automatically Extracting Ontologically Specified Data from HTML Tables of Unknown Structure , 2002, ER.

[15]  Craig A. Knoblock,et al.  Modeling Web Sources for Information Integration , 1998, AAAI/IAAI.