Large Population or Many Generations for Genetic Algorithms? Implications in Information Retrieval

Artificial intelligence models may be used to improve performance of information retrieval (IR) systems and the genetic algorithms (GAs) are an example of such a model. This paper presents an application of GAs as a relevance feedback method aiming to improve the document representation and indexing. In this particular form of GAs, various document descriptions compete with each other and a better collection indexing is sought through reproduction, crossover and mutation operations. In this paradigm, we are searching for the optimal balance between two genetic parameters: the population size and the number of generations. We try to discover the optimal parameter choice both by experiments using the CACM and CISI collections, and by a theoretical analysis providing explanation of the experimental results. The general conclusion tends to be that larger populations have better chance of significantly improving the effectiveness of retrieval.

[1]  Robert R. Korfhage,et al.  Query Optimization in Information Retrieval Using Genetic Algorithms , 1993, ICGA.

[2]  Brian C. O'Connor,et al.  Language and representation in information retrieval , 1993 .

[3]  Hussein Abdel-Wahab,et al.  Proceedings of the Joint Conference on Information Sciences , 1998 .

[4]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[5]  Hsinchun Chen,et al.  Machine Learning for Information Retrieval: Neural Networks, Symbolic Learning, and Genetic Algorithms , 1995, J. Am. Soc. Inf. Sci..

[6]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[7]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[8]  Takanori Shibata,et al.  Genetic Algorithms And Fuzzy Logic Systems Soft Computing Perspectives , 1997 .

[9]  Donald H. Kraft,et al.  GENETIC ALGORITHMS FOR QUERY OPTIMIZATION IN INFORMATION RETRIEVAL: RELEVANCE FEEDBACK , 1997 .

[10]  Dana Vrajitoru,et al.  Crossover Improvement for the Genetic Algorithm in Information Retrieval , 1998, Information Processing & Management.

[11]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[12]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[13]  Martin Dillon,et al.  The Use of Automatic Relevance feedback in Boolean Retrieval Systems , 1980, J. Documentation.

[14]  Frederick E. Petry,et al.  Fuzzy Information Retrieval Using Genetic Algorithms and Relevance Feedback. , 1993 .

[15]  Gilles Brassard,et al.  Fundamentals of Algorithmics , 1995 .

[16]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[17]  Kenneth A. De Jong,et al.  Using Genetic Algorithms to Solve NP-Complete Problems , 1989, ICGA.

[18]  Michael E. Lesk,et al.  Practical Digital Libraries: Books, Bytes, and Bucks , 1997 .

[19]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[20]  Michael D. Gordon User‐based document clustering by redescribing subject descriptions with a genetic algorithm , 1991 .

[21]  Edward A. Fox,et al.  Research Contributions , 2014 .

[22]  Robert R. Korfhage,et al.  Query Improvement in Information Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project , 1992, TREC.

[23]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .