Wikipedia-based Approach for Linking Ontology Concepts to their Realisations in Text

A novel method to automatically associate ontological concepts to their realisations in texts is presented. The method has been developed in the context of the Papyrus project to annotate texts and audio transcripts with a set of relevant concepts from the Papyrus News Ontology. To avoid strong dependency on a specific ontology, the annotation process starts by performing a Wikipedia-based annotation of news items: the most relevant keywords are detected and the Wikipedia pages that best describe their actual meaning are identified. In a later step this annotation is translated into an Ontology-based one: keywords are connected to the most appropriate ontology classes on the basis of a relatedness measure that relies on Wikipedia knowledge. Wikipedia-annotation provides a domain independent abstraction layer that simplify the adaptation of the approach to other domains and ontologies. Evaluation has been performed on a set of manually annotated news, resulting in 58% F1 score for relevant Wikipedia pages and 64% for relevant ontology concepts identification.