Ontologies as facilitators for repurposing web documents

This paper investigates the role of ontologies as a central part of an architecture to repurpose existing material from the web. A prototype system called ArtEquAKT is presented, which combines information extraction, knowledge management and consolidation techniques and adaptive document generation. All of these components are co-ordinated using one central ontology, providing a common vocabulary for describing the information fragments as they are processed. Each of the components of the architecture is described in detail and an evaluation of the system discussed. Conclusions are drawn as to the effectiveness of such an approach and further challenges are outlined.

[1]  Lee Spector,et al.  Ontology-Based Knowledge Discovery on the World-Wide Web , 1996 .

[2]  Atanas Kiryakov,et al.  Towards Semantic Web Information Extraction , 2003 .

[3]  Clara Mancini From cinematographic to hypertext narrative , 2000, HYPERTEXT '00.

[4]  Christiane Fellbaum,et al.  Using Wordnet for Text Retrieval , 1998 .

[5]  Steffen Staab,et al.  S-CREAM: Semiautomatic CREAtion of Metadata , 2002, SAAKM@ECAI.

[6]  Natalya F. Noy,et al.  Component-Based Support for Building Knowledge-Acquisition Systems , 2000 .

[7]  Kenneth C. Litkowski Question-Answering Using Semantic Relation Triples , 1999, TREC.

[8]  Dan Roth,et al.  Probabilistic Reasoning for Entity & Relation Recognition , 2002, COLING.

[9]  Fabio Ciravegna,et al.  Adaptive Information Extraction from Text by Rule Induction and Generalisation , 2001, IJCAI.

[10]  David E. Millard,et al.  Auld Leaky: A Contextual Open Hypermedia Link Server , 2001, OHS-7/SC-3/AH-3.

[11]  David Evans,et al.  Tracking and summarizing news on a daily basis with Columbia's Newsblaster , 2002 .

[12]  Nicola Guarino,et al.  Ontologies and Knowledge Bases. Towards a Terminological Clarification , 1995 .

[13]  David E. Millard,et al.  Towards Open Adaptive Hypermedia , 2002, AH.

[14]  Maria Milosavljevic,et al.  Augmenting the User's Knowledge via Comparison , 1997 .

[15]  Alexiei Dingli,et al.  Mining Web Sites Using Unsupervised Adaptive Information Extraction , 2003, EACL.

[16]  Harith Alani,et al.  Associative and Spatial Relationships in Thesaurus-Based Retrieval , 2000, ECDL.

[17]  Removed Cross Document Annotation for Multimedia Retrieval , 2003 .

[18]  Alexiei Dingli,et al.  Learning to Harvest Information for the Semantic Web , 2004, ESWS.

[19]  Kalina Bontcheva,et al.  Developing Language Processing Components with GATE (a User Guide) , 2003 .

[20]  Ralph Grishman,et al.  A Corpus-based Probabilistic Grammar with Only Two Non-terminals , 1995, IWPT.

[21]  Peter Brusilovsky,et al.  Methods and techniques of adaptive hypermedia , 1996, User Modeling and User-Adapted Interaction.

[22]  James S. Aitken Learning Information Extraction Rules: An Inductive Logic Programming approach , 2002, ECAI.

[23]  Hugh C. Davis,et al.  FOHM: a fundamental open hypertext model for investigating interoperability between hypertext domains , 2000, HYPERTEXT '00.

[24]  Lloyd Rutledge,et al.  Generating presentation constraints from rhetorical structure , 2000, HYPERTEXT '00.

[25]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[26]  Alexiei Dingli,et al.  Multi-strategy definition of annotation services in Melita , 2003 .

[27]  Doug Downey,et al.  A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.

[28]  David E. Millard,et al.  Artequakt: Generating Tailored Biographies with Automatically Annotated Fragments from the Web , 2002, SAAKM@ECAI.

[29]  Dragomir R. Radev,et al.  Generating Natural Language Summaries from Multiple On-Line Sources , 1998, CL.

[30]  Dieter Fensel,et al.  Ontologies: A silver bullet for knowledge management and electronic commerce , 2002 .

[31]  Peter Brusilovsky,et al.  Adaptive Hypermedia , 2001, User Modeling and User-Adapted Interaction.

[32]  Marja-Riitta Koivunen,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.

[33]  Enrico Motta,et al.  Knowledge Extraction by Using an Ontology Based Annotation Tool , 2001, Semannot@K-CAP 2001.

[34]  Bo Hu,et al.  Multimedia Distributed Knowledge Management in MIAKT , 2004, SemAnnot@ISWC.

[35]  Mila Ramos-Santacruz,et al.  REES: A Large-Scale Relation and Event Extraction System , 2000, ANLP.

[36]  Ramanathan V. Guha,et al.  SemTag and seeker: bootstrapping the semantic web via automated semantic annotation , 2003, WWW '03.

[37]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[38]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[39]  James Frew,et al.  Geographic Names: The Implementation of a Gazetteer in a Georeferenced Digital Library , 1999, D Lib Mag..

[40]  Steffen Staab,et al.  Bootstrapping an ontology-based information extraction system for the web , 2003 .

[41]  Janusz Kacprzyk,et al.  Intelligent Exploration of the Web , 2003, Studies in Fuzziness and Soft Computing.

[42]  I. V. Ramakrishnan,et al.  OntoMiner: Bootstrapping and Populating Ontologies from Domain-Specific Web Sites , 2003, IEEE Intell. Syst..

[43]  David E. Millard,et al.  Automatic Ontology-Based Knowledge Extraction from Web Documents , 2003, IEEE Intell. Syst..

[44]  Jian-Yun Nie,et al.  Toward an Ontology-based Web Data Extraction , 2002 .

[45]  Steffen Staab,et al.  Bootstrapping an Ontology-Based Information Extraction System , 2003, Intelligent Exploration of the Web.

[46]  Bernard Mérialdo,et al.  Automatic construction of personalized TV news programs , 1999, MULTIMEDIA '99.

[47]  Boris Katz,et al.  From Sentence Processing to Information Access on the World Wide Web , 1997 .

[48]  James A. Hendler,et al.  Ontology-based Web agents , 1997, AGENTS '97.

[49]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[50]  Nicolaas J. I. Mars,et al.  Towards very large knowledge bases, knowledge building and knowledge sharing 1995 , 1995 .

[51]  Claire Cardie,et al.  Multidocument Summarization via Information Extraction , 2001, HLT.

[52]  David E. Millard,et al.  Automatic Ontology-based Knowledge Extraction and Tailored Biography Generation from the Web , 2003 .

[53]  Alexiei Dingli,et al.  Mining web sites using adaptive information extraction , 2003 .

[54]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[55]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[56]  Steffen Staab,et al.  An annotation framework for the semantic web , 2001 .

[57]  Steffen Staab,et al.  On deep annotation , 2003, WWW '03.

[58]  David E. Millard,et al.  Generating adaptive hypertext content from the semantic web , 2003 .

[59]  Steffen Staab,et al.  CREAM: creating relational metadata with a component-based, ontology-driven annotation framework , 2001, K-CAP '01.

[60]  Lloyd Rutledge,et al.  Finding the story: broader applicability of semantics and discourse for hypermedia generation , 2003, HYPERTEXT '03.

[61]  Arthur Stutt,et al.  MnM: Ontology Driven Semi-automatic and Automatic Support for Semantic Markup , 2002, EKAW.

[62]  Ramanathan V. Guha,et al.  TAP: a Semantic Web platform , 2003, Comput. Networks.