CODA: Computer-aided ontology development architecture

This paper introduces CODA (Computer-aided Ontology Development Architecture), which is both an architecture and an associated framework supporting the transformation of unstructured and semi-structured content into RDF (Resource Description Framework) datasets. The purpose of CODA is to support the entire process that ranges from data extraction and transformation to identity resolution. The final objective is to feed semantic repositories with knowledge extracted from unstructured content. The motivation behind CODA lies in the large effort and design issues required for developing knowledge acquisition systems using content analytics frameworks such as UIMA™ (Unstructured Information Management Architecture) and GATE (General Architecture for Text Engineering). Therefore, CODA extends UIMA with facilities and a powerful language for projection and transformation of UIMA-annotated content into RDF. The proposed platform is oriented towards a wide range of beneficiaries, from semantic applications developers to final users that can easily plug CODA components into compliant desktop tools. We describe and discuss the features of CODA through the article, and we conclude by reporting on the adoption of the CODA framework in the context of a usage scenario, related to knowledge acquisition in the agricultural domain.

[1]  Armando Stellato,et al.  SODA: A Service Oriented Data Acquisition Framework , 2012 .

[2]  Frank Puppe,et al.  Meta-Level Information Extraction , 2009, LWA.

[3]  Johannes Keizer,et al.  The AGROVOC Linked Dataset , 2013, Semantic Web.

[4]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[5]  Terry R. Payne,et al.  Guest Editors' Introduction: Semantic Web Services , 2004, IEEE Intell. Syst..

[6]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[7]  Tran Cao Son,et al.  Semantic Web Services , 2001, IEEE Intell. Syst..

[8]  Stefan Decker,et al.  Mapping between RDF and XML with XSPARQL , 2012, Journal on Data Semantics.

[9]  Jens Lehmann,et al.  Triplify: light-weight linked data publication from relational databases , 2009, WWW '09.

[10]  Johannes Keizer,et al.  Thesaurus Alignment for Linked Data Publishing , 2011, Dublin Core Conference.

[11]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[12]  Max Mühlhäuser,et al.  Darmstadt Knowledge Processing Repository Based on UIMA , 2007 .

[13]  Paolo Bouquet,et al.  An Entity Name System (ENS) for the Semantic Web , 2008, ESWC.

[14]  Herbert Van de Sompel,et al.  Designing the W3C open annotation data model , 2013, WebSci.

[15]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[16]  Steve Cassidy An RDF realisation of LAF in the DADA annotation server , 2010, ACL 2010.

[17]  Thilo Götz,et al.  Design and implementation of the UIMA Common Analysis System , 2004, IBM Syst. J..

[18]  Osgi Alliance,et al.  Osgi Service Platform, Release 3 , 2003 .

[19]  Johannes Keizer,et al.  Thesaurus maintenance, alignment and publication as linked data: the AGROVOC use case , 2012, Int. J. Metadata Semant. Ontologies.

[20]  Paul Buitelaar,et al.  A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis , 2004, ESWS.

[21]  James A. Hendler,et al.  The Semantic Web: A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities , 2001 .

[22]  Henrik Eriksson,et al.  The evolution of Protégé: an environment for knowledge-based systems development , 2003, Int. J. Hum. Comput. Stud..

[23]  Maria Teresa Pazienza,et al.  PEARL: ProjEction of Annotations Rule Language, a Language for Projecting (UIMA) Annotations over RDF Knowledge Bases , 2012, LREC.

[24]  David M. Booth,et al.  Web Services Architecture , 2004 .

[25]  Bob Carpenter,et al.  The logic of typed feature structures , 1992 .

[26]  Leyla Jael García Castro,et al.  An open annotation ontology for science on web 3.0 , 2011, J. Biomed. Semant..

[27]  Maria Teresa Pazienza,et al.  An Architecture for Data and Knowledge Acquisition for the Semantic Web: The AGROVOC Use Case , 2012, OTM Workshops.

[28]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[29]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[30]  Donna K. Harman,et al.  The DARPA TIPSTER project , 1992, SIGF.

[31]  Aldo Gangemi,et al.  Identity of Resources and Entities on the Web , 2008, Int. J. Semantic Web Inf. Syst..