An evaluation of evolved term-weighting schemes in information retrieval

This paper presents an evaluation of evolved term-weighting schemes on short, medium and long TREC queries. A previously evolved global (collection-wide) term-weighting scheme is evaluated on unseen TREC data and is shown to increase mean average precision over idf. A local (within-document) evolved term-weighting scheme is presented which is dependent on the best performing global scheme. The full evolved scheme (i.e. the combined local and global scheme) is compared to both the BM25 scheme and the Pivoted Normalisation scheme.Our results show that the local evolved solution does not perform well on some collections due to its document normalisation properties and we conclude that Okapi-tf can be tuned to interact effectively with the evolved global weighting scheme presented and increase mean average precision over the standard BM25 scheme.

[1]  Nir Oren,et al.  Reexamining tf.idf based information retrieval with Genetic Programming , 2002 .

[2]  Kalervo Järvelin,et al.  Employing the resolution power of search keys , 2001 .

[3]  Warren R. Greiff,et al.  A theory of term weighting based on exploratory data analysis , 1998, SIGIR '98.

[4]  Andrew Trotman,et al.  Learning to Rank , 2005, Information Retrieval.

[5]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[6]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[7]  Weiguo Fan,et al.  A generic ranking function discovery framework by genetic programming for information retrieval , 2004, Inf. Process. Manag..

[8]  Martin Franz,et al.  Word document density and relevance scoring (poster session) , 2000, SIGIR '00.

[9]  Robert R. Korfhage,et al.  Query Optimization in Information Retrieval Using Genetic Algorithms , 1993, ICGA.

[10]  Gerard Salton,et al.  On the Specification of Term Values in Automatic Indexing , 1973 .

[11]  Kui-Lam Kwok,et al.  A new method of weighting query terms for ad-hoc retrieval , 1996, SIGIR '96.

[12]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[13]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[14]  C. Darwin On the Origin of Species by Means of Natural Selection: Or, The Preservation of Favoured Races in the Struggle for Life , 2019 .

[15]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[16]  Edward A. Fox,et al.  Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval , 2004, SIGIR '04.

[17]  Dana Vrajitoru,et al.  Crossover Improvement for the Genetic Algorithm in Information Retrieval , 1998, Information Processing & Management.

[18]  Jorng-Tzong Horng,et al.  Applying genetic algorithms to query optimization in document retrieval , 2000, Inf. Process. Manag..

[19]  Stephen E. Robertson,et al.  Okapi at TREC-6 Automatic ad hoc, VLC, routing, filtering and QSDR , 1997, TREC.

[20]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[21]  Martin Franz,et al.  Word document density and relevance scoring. , 2000, SIGIR 2000.

[22]  Edward A. Fox,et al.  Ranking function optimization for effective Web search by genetic programming: an empirical study , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[23]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[24]  Andrew Trotman An artificial intelligence approach to information retrieval (abstract only) , 2004, SIGIR '04.