Semantic Annotation for Indexing Archaeological Context: A Prototype Development and Evaluation

The paper discusses the process of developing Semantic Annotations, a form of metadata for assigning conceptual entities to textual instances, in this case archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique is central to the annotation process. The paper explores the use of Ontology Oriented Information Extraction (OOIE) methods for the definition of rich semantic-aware indices of archaeology documents. The annotation process follows a rule-based information extraction approach using GATE. In particular the report discusses a prototype development that adopts the core ontology, CIDOC CRM, together with an English Heritage archaeological extension, to inform and direct the information extraction effort. The prototype evaluation, supports the assumptions made, about the capability of the method to construct rich indices of grey literature documents empowered by Semantic Annotations.

[1]  M. C. Debachere,et al.  Problems in Obtaining Grey Literature , 1995 .

[2]  Yorick Wilks,et al.  The Semantic Web: Apotheosis of Annotation, but What Are Its Semantics? , 2008, IEEE Intelligent Systems.

[3]  Nicola Guarino,et al.  Formal Ontology and Information Systems , 1998 .

[4]  Douglas Tudhope,et al.  Excavating grey literature: A case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources , 2010, Aslib Proc..

[5]  Diana Maynard,et al.  Metrics for Evaluation of Ontology-based Information Extraction , 2006, EON@WWW.

[6]  Diana Maynard,et al.  JAPE: a Java Annotation Patterns Engine , 2000 .

[7]  Siegfried Handschuh,et al.  Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[8]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[9]  Kalina Bontcheva,et al.  Semantic Information Access , 2006 .

[10]  Yorick Wilks,et al.  Information Extraction: Beyond Document Retrieval , 1998, Int. J. Comput. Linguistics Chin. Lang. Process..

[11]  Atanas Kiryakov,et al.  Semantic annotation, indexing, and retrieval , 2004, J. Web Semant..

[12]  Karen Spärck Jones,et al.  Natural language processing for information retrieval , 1996, CACM.

[13]  Paul J Cripps,et al.  Ontological Modelling of the work of the Centre for Archaeology , 2005 .

[14]  Kalina Bontcheva,et al.  Semantic Annotation and Human Language Technology , 2006 .

[15]  Douglas Tudhope,et al.  Semantic Interoperability in Archaeological Datasets: Data Mapping and Extraction Via the CIDOC CRM , 2008, ECDL.

[16]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[17]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[18]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.