Indexing Documents by Discourse and Semantic Contents from Automatic Annotations of Texts

The basic aim of the model proposed here is to automatically build semantic metatext structure for texts that would allow us to search and extract discourse and semantic information from texts indexed in that way. This model is built up from two engines: The first engine, called EXCOM (Djioua et al., 2006), is an XML based system for an automatic annotation of texts according to discourse and semantic categories. The second engine called MOCXE uses automatic semantic annotation that is generated by EXCOM to create a semantic inverted index which is able to find relevant documents for queries associated with discursive and semantic categories such as definition, quotation, causality, relations between concepts, etc. We explain by an example of a relation of “connection” between concepts in French. The model used is enough general to be translated in other languages. General presentation Current existing web search engine systems that index texts generate representations as a set of simple and complex