Improving Performance in Constructing specific Web Directory using Focused Crawler: An Experiment on Botany Domain

Nowadays the growth of the web causes some difficulties to search and browse useful information especially in specific domains. However, some portion of the web remains largely underdeveloped, as shown in lack of high quality contents. An example is the botany specific web directory, in which lack of well-structured web directories have limited user’s ability to browse required information. In this research we propose an improved framework for constructing a specific web directory. In this framework we use an anchor directory as a foundation for primary web directory. This web directory is completed by information which is gathered with automatic component and filtered by experts. We conduct an experiment for evaluating effectiveness, efficiency and satisfaction.

[1]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[2]  Wei-Ying Ma,et al.  VIPS: a Vision-based Page Segmentation Algorithm , 2003 .

[3]  Andreas Nürnberger User Adaptive Categorization of Document Collections , 2003, Adaptive Multimedia Retrieval.

[4]  Wanli Zuo,et al.  A New Method for Focused Crawler Cross Tunnel , 2006, RSKT.

[5]  Xiaojun Wan,et al.  Towards a unified approach to document similarity search using manifold-ranking of blocks , 2008, Inf. Process. Manag..

[6]  Yang Gao,et al.  An efficient adaptive focused crawler based on ontology learning , 2005, Fifth International Conference on Hybrid Intelligent Systems (HIS'05).

[7]  Filippo Menczer,et al.  Evaluating topic-driven web crawlers , 2001, SIGIR '01.

[8]  Andreas Nürnberger,et al.  Improving Ontology-Based Sense Folder Classification of Document Collections with Clustering Methods , 2004 .

[9]  Wendy T. Lucas,et al.  Mix and match: combining terms and operators for successful web searches , 2005 .

[10]  Filippo Menczer,et al.  Topic-Driven Crawlers: Machine Learning Issues , 2002 .

[11]  Ee-Peng Lim,et al.  Web Mining - The Ontology Approach , 2005 .

[12]  Kazem Taghva,et al.  Ontology-based classification of email , 2003, Proceedings ITCC 2003. International Conference on Information Technology: Coding and Computing.

[13]  Craig F. Smith,et al.  Thinking on the Web , 2006 .

[14]  Hsinchun Chen,et al.  Organizing domain-specific information on the Web: An experiment on the Spanish business Web directory , 2008, Int. J. Hum. Comput. Stud..

[15]  Hassan Abolhassani,et al.  Classification of Web Pages by Automatically Generated Categories , 2008 .