In this paper, a novel approach of fuzzy-rough hybridization is developed for concept-based document expansion to enhance the quality of text information retrieval. Firstly, different from the traditional way of document representation, a given set of text documents is represented by an incomplete information system. To discover the relevant keywords to be complemented, the weights of those terms which do not occur in a document are considered missing instead of zero. Fuzzy sets are used to take care of the real-valued weights in the term vectors. Rough sets are then used to extract the potentially associated keywords which convey a concept for text retrieval in this incomplete information system. Finally, through incorporating Nearest Neighbor mechanism, the missing weights of the extracted keywords of a document can be filled by searching the corresponding weights of the most similar document. Thus, the documents in the original text dataset are expanded, whereas the number of total keywords is reduced. Some experiments are conducted using part of data from Ruters21578. Since the concept-based method is able to identify and supplement the potentially useful information to each document, the performance of information retrieval in terms of recall is greatly improved.
[1]
Takenobu Tokunaga,et al.
Query expansion using heterogeneous thesauri
,
2000,
Inf. Process. Manag..
[2]
Z. Pawlak.
Rough Sets: Theoretical Aspects of Reasoning about Data
,
1991
.
[3]
W. Bruce Croft,et al.
Relevance feedback and inference networks
,
1993,
SIGIR.
[4]
Hans-Peter Frei,et al.
Concept based query expansion
,
1993,
SIGIR.
[5]
Yee Leung,et al.
Maximal consistent block technique for rule acquisition in incomplete information systems
,
2003,
Inf. Sci..
[6]
Jae Yun Lee,et al.
A corpus-based approach to comparative evaluation of statistical term association measures
,
2001
.
[7]
Marzena Kryszkiewicz,et al.
Rules in Incomplete Information Systems
,
1999,
Inf. Sci..
[8]
Jerzy W. Grzymala-Busse,et al.
Rough Sets
,
1995,
Commun. ACM.
[9]
J. Kacprzyk,et al.
Incomplete Information: Rough Set Analysis
,
1997
.