The Semantic Web, SW, needs semantically-based structured content to both enable better document retrieval and empower semantically-aware agents. One prerequisite for the SW is the widespread adoption of such structured knowledge, so without a universal acceptance other automated methods need to be employed to generate structured content from the existing unstructured web. Most of the current technologies available for creating structured content is based on static human centred annotation, very often completely manual, of documents. Manual annotation is time-consuming and can introduce noise (Ciravegna et al., 2002), being incomplete or incorrect, hence decreasing the quality of the information. For these reasons, we believe that the SW needs automatic methods for annotating content. Automatic annotation services such as SemTag(Dill et al., 2003) and Armadillo(Dingli et al., 2003) intend to solve this problem by automatically providing SW content. In this paper we describe the Armadillo approach to automatic annotation and detail the methods employed internally for integrating and ensuring consistency of elicited knowledge. Armadillo is a tool for extracting and integrating information from large repositories (e.g. the Web) developed at Sheffield. The methodology employed for validating and integrating the information is a series of weak evidential similarity tests, implemented through a library of String Metrics, detailed in section 2. Then the paper focuses on presenting the Armadillo tool and details a use case, relating the methodologies used.
[1]
Alexiei Dingli,et al.
User-System Cooperation in Document Annotation Based on Information Extraction
,
2002,
EKAW.
[2]
Alexiei Dingli,et al.
Automatic semantic annotation using unsupervised information extraction and integration
,
2003
.
[3]
Ramanathan V. Guha,et al.
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
,
2003,
WWW '03.
[4]
Developing a Service-Oriented Architecture to Harvest Information for the Semantic Web
,
2004
.
[5]
J. Iria.
T-Rex : A Flexible Relation Extraction Framework
,
2004
.
[6]
Yorick Wilks,et al.
Designing Adaptive Information Extraction for the Semantic Web in Amilcare
,
2003
.