The quality of an information retrieval system heavily depends on its retrieval function, which returns a similarity measurement between the query and each document in the collection. Documents are sorted according to their similarity values with the query and those with high rank are assumed to be relevant. Okapi BM25 and their variations are very popular retrieval functions and they seem to be the default retrieval function for the IR research community; and there are many other widely used and well studied functions, for example, Pivoted TFIDF and INQUERY. Most of these retrieval functions being used today are made based on probabilistic theories and they are adjusted in real world according to different contexts and information needs. In this paper, we propose the idea that a good retrieval function can be discovered by a pure machine learning approach, without using probabilistic theories and knowledge-based techniques. Two machine learning algorithms, Support Vector Machine (SVM) and Genetic Programming (GP) are used for retrieval function discovery, and GP is found to be a more effective approach. The retrieval functions discovered by GP might be hard for human interpretation, but their performance is superior to Okapi BM25, one of the most popular functions. The new retrieval function is combined with query expansion techniques and the retrieval performance is improved significantly. Based on our observations in the empirical study, the GP function is more reliable and effective than Okapi BM25 when query expansion techniques are used.
[1]
Weiguo Fan,et al.
Effective profiling of consumer information retrieval needs: a unified framework and empirical comparison
,
2005,
Decis. Support Syst..
[2]
David G. Stork,et al.
Pattern Classification (2nd ed.)
,
1999
.
[3]
Vladimir Vapnik,et al.
Statistical learning theory
,
1998
.
[4]
Edward A. Fox,et al.
Tuning before feedback: combining ranking discovery and blind feedback for robust retrieval
,
2004,
SIGIR '04.
[5]
Stephen E. Robertson,et al.
Okapi at TREC-4
,
1995,
TREC.
[6]
Shigeo Abe DrEng.
Pattern Classification
,
2001,
Springer London.
[7]
John R. Koza,et al.
Genetic programming - on the programming of computers by means of natural selection
,
1993,
Complex adaptive systems.
[8]
Stephen E. Robertson,et al.
Okapi at TREC-3
,
1994,
TREC.