A Semantic Structure for Digital Theses Collection Based on Domain Annotations

Search performance can be greatly improved by describing data using Natural Language Processing (NLP) tools to create new metadata for digital libraries. In this paper, a methodology is presented to use a specific domain knowledge to improve user request. This domain knowledge is based on concepts, extracted from the document itself, used as “semantic metadata tags” in order to annotate XML documents. We present the process followed to define and to add new XML semantic metadata into the digital library of scientific theses. Using these new metadata, an ontology is also built to complete the annotation process. Effective retrieval information is obtained by using an intelligent system based on our XML semantic metadata and a domain ontology.