Applying Genetic Algorithms to Information Retrieval Systems Via Relevance Feedback

Genetic programming is applied to a weighted, i.e., fuzzy, information retrieval system in order to improve weighted Boolean query formulation via relevance feedback. This approach brings together the concepts of information retrieval, fuzzy set theory, and genetic programming. Documents are viewed as vectors of weights for the index terms. A weighted Boolean query, viewed as a parse tree, is a chromosome in the genetic algorithm sense. Through the mechanisms of genetic programming, the weighted query is modified in order to improve precision and recall. Relevance feedback is incorporated, in part, via user defined measures over a trial set of documents. The fitness of a candidate query can be expressed directly as a function of the relevance of the retrieved set. Preliminary results based on a testbed are given. The form of the fitness function has a significant effect upon performance and the proper fitness functions take into account relevance based on topicality (and perhaps other factors).