GeniMiner: Web Mining with a Genetic-Based Algorithm

We present in this paper a genetic search strategy for a search engine. We begin by showing that important relations exist between Web statistical studies, search engines, and standard techniques in optimization: the web is a graph which can be searched for relevant information with an evaluation function and with operators based on standard search engines or local exploration. It is then straightforward to define an evaluation function that is a mathematical formulation of the user request and to define a steady state genetic algorithm that evolves a population of pages with binary tournament selection and specific operators. The creation of individuals is performed by querying standard search engines. The mutation operator consists in exploring the neighborhood of a page thanks to the links going out of that page. We present a comparative evaluation which is performed with the same protocol as used in optimization. Our tool obtains pages which are significantly better than those found by standard search engines for complex queries. We conclude by showing that our framework for Web search could be generalized to other optimization t echniques like parallel genetic algorithms.

[1]  C. Lee Giles,et al.  Text and Image Metasearch on the Web , 1999, PDPTA.

[2]  L. Darrell Whitley,et al.  The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best , 1989, ICGA.

[3]  Amnon Barak,et al.  Selectively Destructive Re-start , 1995, International Conference on Genetic Algorithms.

[4]  Beerud Dilip Sheth,et al.  A learning approach to personalized information filtering , 1994 .

[5]  Weiguo Fan,et al.  Automatic Generation of Matching Function by Genetic Programming for Effective Information Retrieval , 1999 .

[6]  Alistair C. Kilgour,et al.  Personalising Information Retrieval using Evolutionary Modelling , 1996 .

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  G. Nocent,et al.  Imagine: a tool for generating HTML style sheets with an interactive genetic algorithm based on genes frequencies , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[9]  Erick Cantú-Paz,et al.  Efficient and Accurate Parallel Genetic Algorithms , 2000, Genetic Algorithms and Evolutionary Computation.

[10]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[11]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[12]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[13]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[14]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[15]  Filippo Menczer,et al.  Artificial Life Applied to Adaptive Information Agents , 1995 .

[16]  Alexandros Moukas Amalthaea Information Discovery and Filtering Using a Multiagent Evolving Ecosystem , 1997, Appl. Artif. Intell..