Using genetic algorithms for query reformulation

Nowadays, searching information in the web or in any kind of document collection has become one of the most frequent activities. However, user queries can be formulated in a way that hinders the recovery of the requested information. The objective of automatic query transformation is to improve the quality of the recovered information. This paper describes a new genetic algorithm used to change the set of terms that compose a user query without user supervision, by complementing an expansion process based on the use of a morphological thesaurus. We apply a stemming process to obtain the stem of a word, for which the thesaurus provides its different forms. The set of candidate query terms is constructed by expanding each term in the original query with the terms morphologically related. The genetic algorithm is in charge of selecting the terms of the final query from the candidate term set. The selection process is based on the retrieval results obtained when searching with different combination of candidate terms. The algorithm shows improvement over some other using standard collections.

[1]  Félix de Moya Anegón,et al.  A test of genetic algorithms in relevance feedback , 2002, Inf. Process. Manag..

[2]  Peter Willett,et al.  An Upperbound to the Performance of Ranked-output Searching: Optimal Weighting of Query Terms using a Genetic Algorithm , 1996, J. Documentation.

[3]  Iadh Ounis,et al.  Query reformulation using automatically generated query concepts from a document space , 2006, Inf. Process. Manag..

[4]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[5]  Oscar Cordón,et al.  A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems , 2002, Soft Comput..

[6]  Mohand Boughanem,et al.  Multiple query evaluation based on an enhanced genetic algorithm , 2003, Inf. Process. Manag..

[7]  Oscar Cordón,et al.  A review on the application of evolutionary computation to information retrieval , 2003, Int. J. Approx. Reason..

[8]  Wagner Meira,et al.  Set-based vector model: An efficient approach for correlation-based ranking , 2005, TOIS.

[9]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[10]  Hsinchun Chen,et al.  A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing , 1998, J. Am. Soc. Inf. Sci..

[11]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (2nd, extended ed.) , 1994 .

[12]  Martin Smith,et al.  The use of genetic programming to build Boolean queries for text retrieval through relevance feedback , 1997, J. Inf. Sci..

[13]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[14]  Vijay V. Raghavan,et al.  On modeling of information retrieval concepts in vector spaces , 1987, TODS.

[15]  Carol Peters,et al.  European research letter: Cross-language system evaluation: The CLEF campaigns , 2001, J. Assoc. Inf. Sci. Technol..

[16]  Robert R. Korfhage,et al.  Query modification using genetic algorithms in vector space models , 1994 .

[17]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1992, Artificial Intelligence.

[18]  Enrique Herrera-Viedma,et al.  Improving the learning of Boolean queries by means of a multiobjective IQBE evolutionary algorithm , 2006, Inf. Process. Manag..

[19]  Stephen E. Robertson,et al.  Optimisation methods for ranking functions with multiple parameters , 2006, CIKM '06.

[20]  Jorng-Tzong Horng,et al.  Applying genetic algorithms to query optimization in document retrieval , 2000, Inf. Process. Manag..

[21]  Donald H. Kraft,et al.  The use of genetic programming to build queries for information retrieval , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[22]  José Luis Fernández-Villacañas Martín,et al.  Investigation of the importance of the genotype-phenotype mapping in information retrieval , 2003, Future Gener. Comput. Syst..

[23]  Ilmério Reis da Silva,et al.  Dependence among terms in vector space model , 2004, Proceedings. International Database Engineering and Applications Symposium, 2004. IDEAS '04..