Selection of Information Sources Using a Genetic Algorithm

We address the problem of information sources selection in a context of a large number of distributed sources. We formulate the sources selection problem as a combinatorial optimization problem in order to yield the best set of relevant information sources for a given query. We define a solution as a combination of sources among a huge predefined set of sources. We propose a genetic algorithm to tackle the issue by maximizing the similarity between a selection and the query. Extensive experiments were performed on databases of scientific research documents covering different domains such as computer science and medicine. The results based on the precision measure are very encouraging.

[1]  Luo Si,et al.  Learning from past queries for resource selection , 2009, CIKM.

[2]  Rajesh Kumar,et al.  A heuristic approach for search engine selection in meta-search engine , 2015, International Conference on Computing, Communication & Automation.

[3]  Jamie Callan,et al.  DISTRIBUTED INFORMATION RETRIEVAL , 2002 .

[4]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[5]  Milad Shokouhi,et al.  Robust result merging using sample-based score estimates , 2009, TOIS.

[6]  Milad Shokouhi,et al.  Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval , 2007, ECIR.

[7]  Fabio Crestani,et al.  Reducing the Uncertainty in Resource Selection , 2013, ECIR.

[8]  Jawed I. A. Siddiqi,et al.  Adaptive information retrieval system via modelling user behaviour , 2014, J. Ambient Intell. Humaniz. Comput..

[9]  Milad Shokouhi,et al.  SUSHI : Scoring Scaled Samples for Server Selection , 2009 .

[10]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[11]  Luis Gravano,et al.  GlOSS: text-source discovery over the Internet , 1999, TODS.

[12]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[13]  Sumio Fujita,et al.  Retrieval parameter optimization using genetic algorithms , 2009, Inf. Process. Manag..

[14]  Hugo Zaragoza,et al.  Structure of morphologically expanded queries: A genetic algorithm approach , 2010, Data Knowl. Eng..

[15]  Ali Selamat,et al.  Query Optimization in Relevance Feedback Using Hybrid GA-PSO for Effective Web Information Retrieval , 2009, 2009 Third Asia International Conference on Modelling & Simulation.

[16]  Michael D. Gordon Probabilistic and genetic algorithms in document retrieval , 1988, CACM.

[17]  Eman Fares Al Mashagba,et al.  Query Optimization Using Genetic Algorithms in the Vector Space Model , 2011, ArXiv.

[18]  Philomina Simon,et al.  A Document Retrieval System with Combination Terms Using Genetic Algorithm , 2010 .

[19]  Luo Si,et al.  A joint probabilistic classification model for resource selection , 2010, SIGIR '10.

[20]  Huilian Fan,et al.  Crawling Strategy of Focused Crawler Based on Niche Genetic Algorithm , 2009, 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing.

[21]  Habiba Drias,et al.  A hybrid genetic algorithm for large scale information retrieval , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[22]  W. Bruce Croft,et al.  Searching Distributed Collections With Inference Networks , 2017, SIGF.

[23]  Joaquín Pérez-Iglesias,et al.  Training a classifier for the selection of good query expansion terms with a genetic algorithm , 2010, IEEE Congress on Evolutionary Computation.

[24]  Huynh Thi Thanh Binh,et al.  Crawl Topical Vietnamese Web Pages Using Genetic Algorithm , 2010, 2010 Second International Conference on Knowledge and Systems Engineering.

[25]  Abdelhamid Bouchachia,et al.  Online and interactive self-adaptive learning of user profile using incremental evolutionary algorithms , 2014, Evol. Syst..

[26]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[27]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..