Information Retrieval in Digital Theses Based on Natural Language Processing Tools

Search performance can be greatly improved by describing data using Natural Language Processing (NLP) tools to create new metadata and domain ontologies. A methodology is presented to use domain specific knowledge to improve user request. This knowledge is based on concepts, extracted from the document itself, used as “semantic metadata tags” in order to annotate XML documents. We present the process followed to define and to add new XML semantic metadata into the digital library of scientific theses. Using these new metadata, an ontology is also constructed by following a methodology. Effective retrieval information is obtained by using an intelligent system based on XML semantic metadata and domain ontology.

[1]  Mark A. Musen,et al.  The Knowledge Model of Protégé-2000: Combining Interoperability and Flexibility , 2000, EKAW.

[2]  Natalya F. Noy,et al.  Component-Based Support for Building Knowledge-Acquisition Systems , 2000 .

[3]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[4]  Enrico Motta,et al.  Reusable components for knowledge modelling , 1998 .

[5]  Peter Weinstein,et al.  Seed ontologies: growing digital libraries as distributed, intelligent systems , 1997, DL '97.

[6]  Henrik Eriksson,et al.  Using knowledge engineering support for a Java documentation viewer , 2002, SEKE '02.

[7]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[8]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[9]  Paul Brennan Presses universitaires de Caen , 1998 .

[10]  Jianying Wang,et al.  A corpus analysis approach for automatic query expansion and its extension to multiple databases , 1999, TOIS.

[11]  Asunción Gómez-Pérez,et al.  Building Ontologies at the Knowledge Level using the Ontology Design Environment , 1998 .

[12]  Josette Rebeyrolle,et al.  Construction d'une base de connaissances terminologiques à partir de textes : expérimentation et définition d'une méthode , 2000 .

[13]  Enrico Motta,et al.  Scholarly Discourse as Computable Structure , 2000, OHS-6/SC-2.

[14]  Henrik Eriksson,et al.  Knowledge modeling at the millennium : The design and evolution of Protégé-2000 , 1999 .

[15]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[16]  Enrico Motta,et al.  ScholOnto: an ontology-based digital library server for research documents and discourse , 2000, International Journal on Digital Libraries.

[17]  Schubert Foo,et al.  Ontology research and development. Part 1 - a review of ontology generation , 2002, J. Inf. Sci..

[18]  John B. Smith,et al.  Author's Argumentation Assistant (AAA): A Hypertext-Based Authoring Tool for Argumentative Texts , 1992, ECHT.