A hybrid genetic algorithm for large scale information retrieval

Artificial intelligence tools are seldom used for information retrieval since classical approaches have addressed this problem in an efficient way. For large scale information retrieval, the situation is different and may necessitate more powerful methodologies. In this paper we show that indeed for large scale collections, heuristic search techniques outperform the conventional approaches in addressing information retrieval. For the purpose of supporting this statement, we have designed and implemented a genetic algorithm then a hybrid genetic algorithm for information retrieval. The effectiveness of both designed algorithms is compared to a classical method by performing empirical tests on Smart collections and random benchmarks. It appears that both the designed algorithms outperform the classical approach for large data sets and the hybrid genetic algorithm yields the best performance in terms of solution quality and runtime.