Ranking function optimization for effective Web search by genetic programming: an empirical study

Web search engines have become indispensable in our daily life to help us find the information we need. Although search engines are very fast in search response time, their effectiveness in finding useful and relevant documents at the top of the search hit list needs to be improved. In this paper, we report our experience applying genetic programming (GP) to the ranking function discovery problem leveraging the structural information of HTML documents. Our empirical experiments using the Web track data from recent TREC conferences show that we can discover better ranking functions than existing well-known ranking strategies from IR, such as Okapi, Ptfidf. The performance is even comparable to those obtained by support vector machine.

[1]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[2]  George W. Furnas,et al.  Pictures of relevance: A geometric analysis of similarity measures , 1987, J. Am. Soc. Inf. Sci..

[3]  Weiguo Fan,et al.  Personalization of search engine services for effective retrieval and knowledge management , 2000, ICIS.

[4]  Vijay V. Raghavan,et al.  Optimal Determination of User-Oriented Clusters: An Application for the Reproductive Plan , 1987, ICGA.

[5]  David Hawking,et al.  Overview of the TREC-9 Web Track , 2000, TREC.

[6]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[7]  David Hawking,et al.  Overview of the TREC-2001 Web track , 2002 .

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  Weiguo Fan,et al.  A generic ranking function discovery framework by genetic programming for information retrieval , 2004, Inf. Process. Manag..

[10]  Fredric C. Gey,et al.  Inferring probability of relevance using the method of logistic regression , 1994, SIGIR '94.

[11]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[12]  M. Amparo Vila,et al.  A Fuzzy Genetic Algorithm Approach to an Adaptive Information Retrieval Agent , 1999, J. Am. Soc. Inf. Sci..

[13]  Garrison W. Cottrell,et al.  Fusion Via a Linear Combination of Scores , 1999, Information Retrieval.

[14]  Michael D. Gordon,et al.  Finding Information on the World Wide Web: The Retrieval Effectiveness of Search Engines , 1999, Inf. Process. Manag..

[15]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[16]  Weiguo Fan,et al.  Effective information retrieval using genetic algorithms based matching functions adaptation , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[17]  Michael D. Gordon User‐based document clustering by redescribing subject descriptions with a genetic algorithm , 1991 .

[18]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[19]  Hinrich Schütze,et al.  Personalized search , 2002, CACM.

[20]  F. W. Lancaster,et al.  Information Retrieval Today , 1993 .

[21]  Gerald Salton,et al.  Automatic text processing , 1988 .

[22]  Gerard Salton,et al.  Document Length Normalization , 1995, Inf. Process. Manag..

[23]  Ee-Peng Lim,et al.  Web classification using support vector machine , 2002, WIDM '02.

[24]  Norbert Fuhr,et al.  Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions , 1994, TOIS.

[25]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[26]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[27]  Weiguo Fan,et al.  Discovery of context-specific ranking functions for effective information retrieval using genetic programming , 2004, IEEE Transactions on Knowledge and Data Engineering.

[28]  Proceedings of The Fourth Text REtrieval Conference, TREC 1995, Gaithersburg, Maryland, USA, November 1-3, 1995 , 1995, TREC.

[29]  Marshall Ramsey,et al.  A Smart Itsy Bitsy Spider for the Web , 1998, J. Am. Soc. Inf. Sci..

[30]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[31]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[32]  Henk Sol,et al.  Proceedings of the 54th Hawaii International Conference on System Sciences , 1997, HICSS 2015.

[33]  Donna K. Harman,et al.  Overview of the Fourth Text REtrieval Conference (TREC-4) , 1995, TREC.

[34]  Chris Buckley,et al.  A probabilistic learning approach for document indexing , 1991, TOIS.

[35]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[36]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.