The effects of fitness functions on genetic programming-based ranking discovery forWeb search

Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR task-discovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is well known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs on GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations on the design of fitness functions for genetic-based information retrieval experiments.

[1]  M. Amparo Vila,et al.  A Fuzzy Genetic Algorithm Approach to an Adaptive Information Retrieval Agent , 1999, J. Am. Soc. Inf. Sci..

[2]  Peter Willett,et al.  Generation of equifrequent Groups of Words using a Genetic Algorithm , 1994, J. Documentation.

[3]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[4]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[5]  Hsinchun Chen,et al.  A Machine Learning Approach to Inductive Query by Examples: An Experiment Using Relevance Feedback, ID3, Genetic Algorithms, and Simulated Annealing , 1998, J. Am. Soc. Inf. Sci..

[6]  Weiguo Fan,et al.  A generic ranking function discovery framework by genetic programming for information retrieval , 2004, Inf. Process. Manag..

[7]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[8]  Michael D. Gordon User‐based document clustering by redescribing subject descriptions with a genetic algorithm , 1991 .

[9]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[10]  Oscar Cordón,et al.  A new evolutionary algorithm combining simulated annealing and genetic programming for relevance feedback in fuzzy information retrieval systems , 2002, Soft Comput..

[11]  Gerald Salton,et al.  Automatic text processing , 1988 .

[12]  Alistair Moffat,et al.  Exploring the similarity space , 1998, SIGF.

[13]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[14]  Henrik Legind Larsen,et al.  A fuzzy genetic algorithm approach to an adaptive information retrieval agent , 1999 .

[15]  Félix de Moya Anegón,et al.  A test of genetic algorithms in relevance feedback , 2002, Inf. Process. Manag..

[16]  Peter Willett,et al.  An Upperbound to the Performance of Ranked-output Searching: Optimal Weighting of Query Terms using a Genetic Algorithm , 1996, J. Documentation.

[17]  Martin Smith,et al.  The use of genetic programming to build Boolean queries for text retrieval through relevance feedback , 1997, J. Inf. Sci..

[18]  Stephen I. Gallant,et al.  Connectionist expert systems , 1988, CACM.

[19]  Kui-Lam Kwok Comparing representations in Chinese information retrieval , 1997, SIGIR '97.

[20]  Robert R. Korfhage,et al.  Query Optimization in Information Retrieval Using Genetic Algorithms , 1993, ICGA.

[21]  Donald H. Kraft,et al.  Applying Genetic Algorithms to Information Retrieval Systems Via Relevance Feedback , 1995 .

[22]  Edward A. Fox,et al.  Ranking function optimization for effective Web search by genetic programming: an empirical study , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[23]  Dana Vrajitoru,et al.  Crossover Improvement for the Genetic Algorithm in Information Retrieval , 1998, Information Processing & Management.

[24]  Jorng-Tzong Horng,et al.  Applying genetic algorithms to query optimization in document retrieval , 2000, Inf. Process. Manag..

[25]  Hsinchun Chen Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms , 1995 .

[26]  Proceedings of The Fourth Text REtrieval Conference, TREC 1995, Gaithersburg, Maryland, USA, November 1-3, 1995 , 1995, TREC.

[27]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[28]  Weiguo Fan,et al.  Effective information retrieval using genetic algorithms based matching functions adaptation , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[29]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[30]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[31]  Weiguo Fan,et al.  Personalization of search engine services for effective retrieval and knowledge management , 2000, ICIS.

[32]  David Hawking,et al.  Overview of the TREC-9 Web Track , 2000, TREC.

[33]  Robert R. Korfhage,et al.  Query Improvement in Information Retrieval Using Genetic Algorithms - A Report on the Experiments of the TREC Project , 1992, TREC.

[34]  Weiguo Fan,et al.  Discovery of context-specific ranking functions for effective information retrieval using genetic programming , 2004, IEEE Transactions on Knowledge and Data Engineering.

[35]  Weiguo Fan,et al.  Automatic Generation of Matching Function by Genetic Programming for Effective Information Retrieval , 1999 .

[36]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[37]  Vicente P. Guerrero-Bote,et al.  Order-based Fitness Functions for Genetic Algorithms Applied to Relevance Feedback , 2003, J. Assoc. Inf. Sci. Technol..

[38]  Andreas Geyer-Schulz,et al.  Fuzzy Genetic Algorithms , 1998 .

[39]  Donald H. Kraft,et al.  GENETIC ALGORITHMS FOR QUERY OPTIMIZATION IN INFORMATION RETRIEVAL: RELEVANCE FEEDBACK , 1997 .

[40]  Donald H. Kraft,et al.  The use of genetic programming to build queries for information retrieval , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[41]  Hsinchun Chen,et al.  A smart itsy bitsy spider for the web , 1998 .

[42]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[43]  Donna Harman,et al.  The fourth text REtrieval conference , 1996 .