A New Approach to Design a Domain Specific Web Search Crawler Using Multilevel Domain Classifier

Nowadays information published in the internet has become a common knack for all. As a result volume of information has become huge. To handle that huge volume information, Web researchers are introduced various types of search engines. Efficiently Web-page crawling and resource repository building mechanisms are an important part of a search engine. Currently, Web researchers are already introduced various types of Web search crawler mechanism for the various search engines. In this paper, we have introduced a new design and development mechanism of domain-specific Web search crawler, which uses multilevel domain classifiers and crawls multiple domain related Web-pages, uses parallel crawling, etc. Two domain classifiers used to identify domain-specific Web-pages. These two domain classifiers are used one after the other, i.e., two levels. That’s why we are calling this Web search crawler is a multilevel domain-specific Web search crawler.

[1]  Carl Lagoze,et al.  Focused Crawls, Tunneling, and Digital Libraries , 2002, ECDL.

[2]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[3]  Anirudha Sahoo,et al.  An 802.11 Based MAC Protocol for Providing QoS to Real Time Applications , 2007 .

[4]  J. P. Gupta,et al.  Parallel crawler architecture and web page change detection , 2008 .

[5]  Debajyoti Mukhopadhyay,et al.  A New Approach to Design Domain Specific Ontology Based Web Crawler , 2007, 10th International Conference on Information Technology (ICIT 2007).

[6]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[7]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[8]  Sang Ho Lee,et al.  Scrawler: A Seed-By-Seed Parallel Web Crawler , 2007, ICE-B.

[9]  Robert Meersman,et al.  Data modelling versus ontology engineering , 2002, SGMD.

[10]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[11]  Walter Willinger,et al.  Scaling phenomena in the Internet: Critically examining criticality , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[13]  Debajyoti Mukhopadhyay,et al.  A New Approach to Design Graph Based Search Engine for Multiple Domains Using Different Ontologies , 2008, 2008 International Conference on Information Technology.

[14]  Julie J. Rehmeyer Mapping a medusa: The internet spreads its tentacles , 2009 .

[15]  Robert Meersman,et al.  An ontology engineering methodology for DOGMA , 2008 .

[16]  Hector Garcia-Molina,et al.  Parallel crawlers , 2002, WWW.

[17]  Marc Ehrig,et al.  Ontology-focused crawling of Web documents , 2003, SAC '03.

[18]  Stephen Gilmore,et al.  Evaluating the Performance of Skeleton-Based High Level Parallel Programs , 2004, International Conference on Computational Science.

[19]  Ling Zhang,et al.  A Parallel Crawling Schema Using Dynamic Partition , 2004, International Conference on Computational Science.