A Framework for High-Performance Web Mining in Dynamic Environments using Honeybee Search Strategies

The methodology for the knowledge discovery in databases architecture outlines possible approaches taken by search engines to improve their IR systems. The conventional approach provided the requester with query results based on the user’s knowledge of respective IR systems. This paper proposes the use of an information sharing model based on the information processing methodology of honeybees and knowledge discovery in databases as opposed to the traditional IR models used by current search engines. The major limitation of IR-based systems is their dependency on human editors which is reflected in static sets of query terms and the use of stemming. Experimental results are presented for data clustering component (Web page indexer) of the Tocorime Apicu search engine which is based on the information sharing model.

[1]  Reginald L. Walker Using Nearest Neighbors to Discover Web Page Similarities , 2002, PDPTA.

[2]  D. Stott Parker,et al.  Tocorime apicu: design of an experimental search engine using an information sharing model , 2003 .

[3]  J. Free,et al.  The social organization of honeybees , 1977 .

[4]  Reginald L. Walker,et al.  Simulating an Information Ecosystem within the WWW , 2002, HIS.

[5]  Mark P. Sinka,et al.  A Large Benchmark Dataset for Web Document Clustering , 2002 .

[6]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[7]  Pieter Adriaans,et al.  Data mining , 1996 .

[8]  Pedro Pina,et al.  Self-Organized Data and Image Retrieval as a Consequence of Inter-Dynamic Synergistic Relationships in Artificial Ant Colonies , 2002, HIS.

[9]  Matteo Golfarelli,et al.  An ANTS Algorithm for Optimizing the Materialization of Fragmented Views in Data Warehouses: Preliminary Results , 2001, EvoWorkshops.

[10]  J. Free,et al.  Pheromones of social bees , 1987 .

[11]  Reginald L. Walker,et al.  Search engine case study: searching the web using genetic programming and MPI , 2001, Parallel Comput..

[12]  Javier Ruiz-del-Solar,et al.  Soft computing systems : design, management and applications , 2002 .

[13]  Giles,et al.  Searching the world wide Web , 1998, Science.

[14]  Hussein A. Abbass,et al.  MBO: marriage in honey bees optimization-a Haplometrosis polygynous swarming approach , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[15]  Srinivasan Parthasarathy,et al.  Exploiting Dataset Similarity for Distributed Mining , 2000, IPDPS Workshops.

[16]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[17]  Byoung-Tak Zhang,et al.  Evolutionary learning of Web-document structure for information retrieval , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).