This paper describes a new method to automatically obtain a new thesaurus which exploits previously collected information. Our method relies on different resources, such as a text collection, a set of source thesauri and other linguistic resources. We have applied different techniques in the different phases of the process. By applying indexing techniques, the text collection provides the set of initial terms of interest for the new thesaurus. Then, these terms are searched in the source thesauri, providing the initial structure of the new thesaurus. Finally, the new thesaurus is enriched by searching for new relationships among its terms. These relationships are first detected using similarity measures and then are characterized with a type (equivalence, hierarchy or associativity) by using different linguistic resources. We have based the system evaluation on the results obtained with and without the thesaurus in an information retrieval task proposed by the Cross-Language Evaluation Forum (CLEF). The results of these experiments have revealed a clear improvement of the performance.
[1]
C. J. van Rijsbergen,et al.
The selection of good search terms
,
1981,
Inf. Process. Manag..
[2]
Yonggang Qiu.
Improving the retrieval effectiveness by a similarity thesaurus
,
1995
.
[3]
Ángel F. Zazo Rodríguez,et al.
Reformulation of queries using similarity thesauri
,
2005,
Inf. Process. Manag..
[4]
W. Bruce Croft,et al.
An Association Thesaurus for Information Retrieval
,
1994,
RIAO.
[5]
Hans-Peter Frei,et al.
Concept based query expansion
,
1993,
SIGIR.
[6]
Clement T. Yu,et al.
An Evaluation of Term Dependence Models in Information Retrieval
,
1982,
SIGIR.
[7]
Gerard Salton,et al.
Automatic Information Organization And Retrieval
,
1968
.
[8]
Karen Spärck Jones,et al.
Automatic term classifications and retrieval
,
1968,
Inf. Storage Retr..