A fuzzy model of document retrieval systems

Abstract This paper is concerned with the organization and retrieval of records in document retrieval systems which admit of imprecision in the form of fuzziness in document characterization and retrieval rules. A mathematical model for such systems, based on the theory of fuzzy sets, is introduced. A document retrieval system, as defined in this paper, is a quadruple (X, D, Q, γ), where X is a collection of the document descriptions (also referred to as index records, or records); D is the descriptor set; Q is a query set; γ: QxX → [0, 1], (called the matching function) assigns to each pair (q, x) where q ϵ Q and x ϵ X, a number γ(q, x) in the interval [0, 1], called the matching index for the query q and the document description x. In our system model, each document description x is defined as a fuzzy set in the descriptor set D. As a fuzzy subset of D, each x is characterized by a membership function μx: D → [0, 1], where μx(d), representing the grade of membership of d in x, is referred to as the index weight of the descriptor d for the document representation x. The retrieval response of the system is defined in terms of the matching function γ. More specifically, given a query q, the index record retrieval response, f(q), is defined to be a fuzzy set in X whose membership function is given by μ ƒ(q) (x) = γ(q, x) . To deal with the organization problems of data in our conceptual model, the conventional concept of a list is extended to a fuzzy list. Specifically, L(d), the fuzzy list corresponding to a descriptor d, is defined as a fuzzy set in the document description set X whose membership function is given by μ l (d) (x) = μ x , (d) . In this way, the notion of an inverted file structure can be extended to the fuzzy data in our retrieval model.