Ontology driven content extraction using interlingual annotation of texts in the OMNIA project

OMNIA is an on-going project that aims to retrieve images accompanied with multilingual texts. In this paper, we propose a generic method (language and domain independent) to extract conceptual information from such texts and spontaneous user requests. First, texts are labelled with interlingual annotation, then a generic extractor taking a domain ontology as a parameter extract relevant conceptual information. Implementation is also presented with a first experiment and preliminary results.

[1]  Christian Boitet,et al.  PIVAX, an online contributive lexical database for heterogeneous MT systems using a lexical pivot , 2007 .

[2]  Jérôme Euzenat,et al.  An API for Ontology Alignment , 2004, SEMWEB.

[3]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[4]  A. Wierzbicka Semantics: Primes and Universals , 1996 .

[5]  J. Aitchison Words in the mind , 1994 .

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  J. Aitchison Words in the Mind: An Introduction to the Mental Lexicon , 1987 .

[8]  Roy Fielding,et al.  Architectural Styles and the Design of Network-based Software Architectures"; Doctoral dissertation , 2000 .

[9]  A. Colmerauer Les systèmes Q ou un formalisme pour analyser et synthétiser des phrases sur ordinateur , 1992 .

[10]  Christian Boitet,et al.  Speech translation for French within the c-STAR II consortium and future perspectives , 2000, INTERSPEECH.

[11]  Christian Boitet,et al.  An Evaluation of UNL Usability for High Quality Multilingualization and Projections for a Future UNL++ Language , 2007, CICLing.

[12]  Gerard Salton,et al.  The smart document retrieval project , 1991, SIGIR '91.

[13]  Daoud Maher Daoud Il faut et on peut construire des systèmes de commerce électronique à interface en langue naturelle restreints (et multilingues) en utilisant des méthodes orientées vers les sous-langages et le contenu , 2006 .

[14]  Christian Boitet,et al.  Portage linguistique d'applications de gestion de contenu , 2007 .

[15]  Danny Jones,et al.  Words in the mind: An introduction to the mental lexicon , 2004, Machine Translation.

[16]  Didier Schwab,et al.  Lexical Functions for Ants Based Semantic Analysis , 2007, IC-AI.

[17]  Zellig S. Harris,et al.  The form of information in science , 1988 .

[18]  Fabio Pianesi,et al.  The NESPOLE! Speech-to-Speech Translation System , 2002, AMTA.