Using genetic algorithms to evolve a population of topical queries

Systems for searching the Web based on thematic contexts can be built on top of a conventional search engine and benefit from the huge amount of content as well as from the functionality available through the search engine interface. The quality of the material collected by such systems is highly dependant on the vocabulary used to generate the search queries. In this scenario, selecting good query terms can be seen as an optimization problem where the objective function to be optimized is based on the effectiveness of a query to retrieve relevant material. Some characteristics of this optimization problem are: (1) the high-dimensionality of the search space, where candidate solutions are queries and each term corresponds to a different dimension, (2) the existence of acceptable suboptimal solutions, (3) the possibility of finding multiple solutions, and in many cases (4) the quest for novelty. This article describes optimization techniques based on Genetic Algorithms to evolve ''good query terms'' in the context of a given topic. The proposed techniques place emphasis on searching for novel material that is related to the search context. We discuss the use of a mutation pool to allow the generation of queries with new terms, study the effect of different mutation rates on the exploration of query-space, and discuss the use of a especially developed fitness function that favors the construction of queries containing novel but related terms.

[1]  M. Amparo Vila,et al.  A Fuzzy Genetic Algorithm Approach to an Adaptive Information Retrieval Agent , 1999, J. Am. Soc. Inf. Sci..

[2]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[3]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[4]  Ana Gabriela Maguitman,et al.  Suggesting novel but related topics: towards context-based support for knowledge model extension , 2005, IUI '05.

[5]  Filippo Menczer,et al.  Dynamic extraction topic descriptors and discriminators: towards automatic context-based topic search , 2004, CIKM '04.

[6]  Bart Selman,et al.  The Hidden Web , 1997, AI Mag..

[7]  Robert R. Korfhage,et al.  Query Optimization in Information Retrieval Using Genetic Algorithms , 1993, ICGA.

[8]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[9]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[10]  Rael Dornfest,et al.  Google hacks - 100 industrial-strength tips and tools , 2002 .

[11]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[12]  Henrik Legind Larsen,et al.  A fuzzy genetic algorithm approach to an adaptive information retrieval agent , 1999 .

[13]  Hsinchun Chen,et al.  The use of dynamic contexts to improve casual internet searching , 2003, TOIS.

[14]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[15]  David Leake,et al.  Capture, Storage and Reuse of Lessons about Information Resources: Supporting Task-Based Information Search* , 2000 .

[16]  Petros Zerfos,et al.  Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[17]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[18]  Thad Starner,et al.  Remembrance Agent: A Continuously Running Automated Information Retrieval System , 1996, PAAM.

[19]  Barry Smyth,et al.  Similarity vs. Diversity , 2001, ICCBR.

[20]  Lawrence Birnbaum,et al.  Information access in context , 2001, Knowl. Based Syst..

[21]  Giovanni Felici,et al.  Improving search results with data mining in a thematic search engine , 2004, Comput. Oper. Res..

[22]  Susan Gauch,et al.  Search improvement via automatic query reformulation , 1991, TOIS.

[23]  Marco Arguedas,et al.  Aiding knowledge capture by searching for extensions of knowledge models , 2003, K-CAP '03.

[24]  Z. Z. Nick,et al.  Web search using a genetic algorithm , 2001 .

[25]  Hava T. Siegelmann,et al.  On the allocation of documents in multiprocessor information retrieval systems , 1991, SIGIR '91.

[26]  Frederick E. Petry,et al.  Fuzzy Information Retrieval Using Genetic Algorithms and Relevance Feedback. , 1993 .

[27]  Ibrahim Kushchu,et al.  Web-based evolutionary and adaptive information retrieval , 2005, IEEE Transactions on Evolutionary Computation.

[28]  Filippo Menczer,et al.  Topical web crawlers: Evaluating adaptive algorithms , 2004, TOIT.

[29]  K. Hammond,et al.  Beyond Similarity , 2000 .

[30]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[31]  Lalit M. Patnaik,et al.  Genetic algorithms: a survey , 1994, Computer.

[32]  Paul P. Maglio,et al.  SUITOR: an attentive information system , 2000, IUI '00.

[33]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[34]  Vijay V. Raghavan,et al.  Optimal Determination of User-Oriented Clusters: An Application for the Reproductive Plan , 1987, ICGA.

[35]  Marshall Ramsey,et al.  An intelligent personal spider (agent) for dynamic Internet/Intranet searching , 1998, Decis. Support Syst..

[36]  Terry Winograd,et al.  SenseMaker: an information-exploration interface supporting the contextual evolution of a user's interests , 1997, CHI.