Intelligent text information retrieval systems need the capability to deal with the semantics of the content of their text bases. In order to satisfy this requisite it is necessary to extract semantic information from the documents and to be able to make inferences about it. A methodology to semi-automatically transform a traditional web IR system into a semantic aware one is proposed. The methodology is composed by three major steps: construction of an appropriate semantic ontology; text enrichment with semantic in- formation; and construction of the inference engine. In order to create an adequate ontology, natural language processing techniques are applied, such as, partial parsers and lexical information (WordNet). Documents are enriched with semantic informa- tion using the output of the partial parsers and the obtained ontology. Finally, an infer- ence engine based on a declarative programming language - Prolog - is used as the basis for the reasoning process. An application of this methodology to the legal web information retrieval system of the Portuguese Attorney General's Office is described.
[1]
Paulo Quaresma,et al.
Using logic programming to model Multi-Agent web legal systems – an application report
,
2001,
ICAIL '01.
[2]
Roland Hausser,et al.
Database semantics for natural language
,
2001,
Artif. Intell..
[3]
Ora Lassila,et al.
W3c resource description framework (rdf) model and syntax specification
,
1998
.
[4]
James A. Hendler,et al.
Towards the semantic web: knowledge representation in a dynamic, distributed environment
,
2001
.
[5]
William A. Woods,et al.
Conceptual Indexing: A Better Way to Organize Knowledge
,
1997
.
[6]
Paulo Quaresma,et al.
PGR: Portuguese Attorney General's Office Decisions on the Web
,
2001,
INAP.
[7]
Dan Brickley,et al.
Resource Description Framework (RDF) Model and Syntax Specification
,
2002
.