Optimization of keyword grouping in biomedical information retrieval using evolutionary algorithms

The amount of data available in the field of life sciences is growing exponentially; therefore, intelligent information search strategies are required to find relevant information as fast and correctly as possible. In this paper we propose a document keyword clustering approach: On the basis of a given set of documents, we identify groups of keywords found in the given documents. Having developed those clusters, the complexity of the data base can be handled much easier: Future user queries can be extended with terms found in the same clusters as those originally defined by the user. In this paper we present a framework for representing and evaluating keyword clusters on a given data basis as well as a simple evolutionary algorithm (based on an evolution strategy) that shall find optimal keyword clusters. In the empirical section of this paper we document first results obtained using a data set published at the TREC-9 conference.

[1]  Peter Willett,et al.  Document clustering using an inverted file approach , 1980 .

[2]  Anne Fontaine Sub-element Indexing and Probabilistic Retrieval in the POSTGRES Database System , 1995 .

[3]  John Holland,et al.  Adaptation in Natural and Artificial Sys-tems: An Introductory Analysis with Applications to Biology , 1975 .

[4]  Liu Huai,et al.  Application of Genetic Algorithm in Document Clustering , 2009, 2009 International Conference on Information Technology and Computer Science.

[5]  Michael D. Gordon User‐based document clustering by redescribing subject descriptions with a genetic algorithm , 1991 .

[6]  Michael D. Gordon User-based document clustering by redescribing subject descriptions with a genetic algorithm , 1991, J. Am. Soc. Inf. Sci..

[7]  Donna K. Harman,et al.  How effective is suffixing? , 1991, J. Am. Soc. Inf. Sci..

[8]  Stephan M. Winkler,et al.  Modeling of heuristic optimization algorithms , 2008 .

[9]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[10]  H. P. Schwefel,et al.  Numerische Optimierung von Computermodellen mittels der Evo-lutionsstrategie , 1977 .

[11]  Peter Willett,et al.  Generation of equifrequent Groups of Words using a Genetic Algorithm , 1994, J. Documentation.

[12]  Gerald Salton,et al.  Automatic text processing , 1988 .

[13]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[14]  Gareth Jones,et al.  Non-hierarchic document clustering using a genetic algorithm , 1995, Information Research.

[15]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[16]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[17]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .