iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text

Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontologybased information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument's ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.

[1]  Vojtech Svátek,et al.  Towards web information extraction using extraction ontologies and (indirectly) domain ontologies , 2007, K-CAP '07.

[2]  Andreas Dengel,et al.  Believing Finite-State Cascades in Knowledge-Based Information Extraction , 2008, KI.

[3]  Vojtech Svátek,et al.  The Ex Project: Web Information Extraction Using Extraction Ontologies , 2009, Knowledge Discovery Enhanced with Semantic and Social Information.

[4]  David W. Embley,et al.  Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.

[5]  Kalina Bontcheva,et al.  Evolving GATE to meet new challenges in language engineering , 2004, Natural Language Engineering.

[6]  Janusz Kacprzyk,et al.  Intelligent Exploration of the Web , 2003, Studies in Fuzziness and Soft Computing.

[7]  Brigitte Endres-Niggemeyer,et al.  Wissen gewinnen durch Wissen : Ontologiebasierte Informationsextraktion , 2006 .

[8]  Steffen Staab,et al.  Bootstrapping an ontology-based information extraction system for the web , 2003 .

[9]  Paul Buitelaar,et al.  Ontology-based information extraction and integration from heterogeneous data sources , 2008, Int. J. Hum. Comput. Stud..

[10]  Andreas Abecker,et al.  Using Information Extraction Rules for Extending Domain Ontologies , 2001, Workshop on Ontology Learning.

[11]  Andreas Dengel,et al.  OCAS : Ontology-Based Corpus and Annotation Scheme Towards an OBIE Gold Standard that contains even implicit facts , 2008 .

[12]  Frank Puppe,et al.  Rule-Based Information Extraction for Structured Data Acquisition using TextMarker , 2008, LWA.

[13]  Ralph Grishman,et al.  Design of the MUC-6 evaluation , 1995, MUC.

[14]  L. Sauermann,et al.  PIMO-a Framework for Representing Personal Information Models , 2007 .

[15]  Fabio Ciravegna,et al.  Evaluating machine learning for information extraction , 2005, ICML.

[16]  Jerry R. Hobbs,et al.  Principles of Template Design , 1994, HLT.

[17]  Frank Bomarius,et al.  KI 2008: Advances in Artificial Intelligence, 31st Annual German Conference on AI, KI 2008, Kaiserslautern, Germany, September 23-26, 2008. Proceedings , 2008, KI.

[18]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[19]  Steffen Staab,et al.  Bootstrapping an Ontology-Based Information Extraction System , 2003, Intelligent Exploration of the Web.