A Fuzzy-Rough Method for Concept-Based Document Expansion

In this paper, a novel approach of fuzzy-rough hybridization is developed for concept-based document expansion to enhance the quality of text information retrieval. Firstly, different from the traditional way of document representation, a given set of text documents is represented by an incomplete information system. To discover the relevant keywords to be complemented, the weights of those terms which do not occur in a document are considered missing instead of zero. Fuzzy sets are used to take care of the real-valued weights in the term vectors. Rough sets are then used to extract the potentially associated keywords which convey a concept for text retrieval in this incomplete information system. Finally, through incorporating Nearest Neighbor mechanism, the missing weights of the extracted keywords of a document can be filled by searching the corresponding weights of the most similar document. Thus, the documents in the original text dataset are expanded, whereas the number of total keywords is reduced. Some experiments are conducted using part of data from Ruters21578. Since the concept-based method is able to identify and supplement the potentially useful information to each document, the performance of information retrieval in terms of recall is greatly improved.