In the field of life sciences it often turns out to be a challenge to quickly find the desired information due to the huge amount of available data. The research area of information retrieval (IR) addresses this problem and tries to provide suitable solutions. One of the approaches used in IR is query extension based on keyword or document clusters.
In this paper we present a deep analysis of a keyword clustering approach using four different kinds of evolutionary algorithms, namely evolution strategy (ES), genetic algorithm (GA), genetic algorithm with strict offspring selection (OSGA), and the multi-objective elitist non-dominated sorting genetic algorithm (NSGA-II).
We have identified features that characterize solution candidates for the keyword clustering problem, e.g., the number of documents covered and how well the identified clusters of keywords match with the occurrence of keywords in the given set of documents. The use of these features and how evolutionary algorithms can be used to solve the optimization of keyword clusters is shown in this paper.
To test the here presented approach we used a real world data set provided within the TREC-9 conference; this data collection includes information about approximately 36,000 documents collected from the PubMed database.
In the results section we compare the performance of the here tested evolutionary algorithms and see that especially ES and NSGA-II produce meaningful results for this documents collection. This approach based on evolutionary algorithms shall be used further on in automated query extension for biomedical information retrieval in PubMed.
[1]
Ellen M. Voorhees,et al.
The Ninth Text REtrieval Conference (TREC-9)
,
2001
.
[2]
H. P. Schwefel,et al.
Numerische Optimierung von Computermodellen mittels der Evo-lutionsstrategie
,
1977
.
[3]
Linda A. Watson,et al.
Information Retrieval: A Health and Biomedical Perspective.
,
2005
.
[4]
Kalyanmoy Deb,et al.
A fast and elitist multiobjective genetic algorithm: NSGA-II
,
2002,
IEEE Trans. Evol. Comput..
[5]
Ophir Frieder,et al.
Information Retrieval: Algorithms and Heuristics (The Kluwer International Series on Information Retrieval)
,
2004
.
[6]
Stephan M. Winkler,et al.
Genetic Algorithms and Genetic Programming - Modern Concepts and Practical Applications
,
2009
.
[7]
Hugo Zaragoza,et al.
Information Retrieval: Algorithms and Heuristics
,
2002,
Information Retrieval.
[8]
Stephan M. Winkler,et al.
Optimization of keyword grouping in biomedical information retrieval using evolutionary algorithms
,
2010
.