Web based Knowledge Extraction and Consolidation for Automatic Ontology Instantiation

The Web is probably the largest and richest information repository available today. Search engines are the common access routes to this valuable source. However, the role of these search engines is often limited to the retrieval of lists of potentially relevant documents. The burden of analysing the returned documents and identifying the knowledge of interest is therefore left to the user. The Artequakt system aims to deploy natural language tools to automatically extract and consolidate knowledge from web documents and instantiate a given ontology, which dictates the type and form of knowledge to extract. Artequakt focuses on the domain of artists, and uses the harvested knowledge to generate tailored biographies. This paper describes the latest developments of the system and discusses the problem of knowledge consolidation.

[1]  Enrico Motta,et al.  Knowledge Extraction by Using an Ontology Based Annotation Tool , 2001, Semannot@K-CAP 2001.

[2]  Steffen Staab,et al.  An annotation framework for the semantic web , 2001 .

[3]  Steffen Staab,et al.  S-CREAM: Semiautomatic CREAtion of Metadata , 2002, SAAKM@ECAI.

[4]  Christiane Fellbaum,et al.  Using Wordnet for Text Retrieval , 1998 .

[5]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[6]  David E. Millard,et al.  Auld Leaky: A Contextual Open Hypermedia Link Server , 2001, OHS-7/SC-3/AH-3.

[7]  David E. Millard,et al.  Automatic extraction of knowledge from web documents , 2003 .

[8]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[9]  David E. Millard,et al.  Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web , 2002, SAAKM@ECAI.

[10]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[11]  Thierry Declerck,et al.  Cross document annotation for multimedia retrieval , 2003 .

[12]  Steffen Staab,et al.  Bootstrapping an Ontology-Based Information Extraction System , 2003, Intelligent Exploration of the Web.

[13]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[14]  Claire Cardie,et al.  Multidocument Summarization via Information Extraction , 2001, HLT.

[15]  Kathleen R. McKeown,et al.  Generating natural language summaries from multiple on-line sources , 1998 .

[16]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[17]  Manolis Tzagarakis,et al.  Hypermedia: Openness, Structural Awareness, and Adaptivity , 2002, Lecture Notes in Computer Science.

[18]  David E. Millard,et al.  Generating adaptive hypertext content from the semantic web , 2003 .

[19]  Thierry Poibeau Deriving a multi-domain information extraction system from a rough ontology , 2001, IJCAI.

[20]  James Frew,et al.  Geographic Names: The Implementation of a Gazetteer in a Georeferenced Digital Library , 1999, D Lib Mag..

[21]  Alexiei Dingli,et al.  Mining Web Sites Using Unsupervised Adaptive Information Extraction , 2003, EACL.

[22]  Harith Alani,et al.  Associative and Spatial Relationships in Thesaurus-Based Retrieval , 2000, ECDL.