Keyword focused web crawler

Users and uses of internet is growing tremendously these days which causing an extreme trouble and efforts at user side to get web pages searched which are as per concern and relevant to user's requirement Generally users approach to search web pages from a large available hierarchy of concepts or use a query to browse web pages from available search engine and receive results based on search pattern where few of the results are relevant to search and most of them are not. Web crawler plays an important role in search engine and act as a key element when performance is considered. This paper includes domain engineering concept and keyword driven crawling with relevancy decision mechanism and uses Ontology concepts which ensures the best path for improving crawler's performance. This paper introduces extraction of URLs based on keyword or search criteria. It extracts URLs for web pages which contains searched keyword in their content and considers such pages only as important and doesn't download web pages irrelevant to search. It offers high optimality comparing with traditional web crawler and can enhance search efficiency with more accuracy.

[1]  Li Kui Crawling Dynamic Web Pages in WWW Forums , 2007 .

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Hema Swetha Koppula,et al.  Learning URL patterns for webpage de-duplication , 2010, WSDM '10.

[4]  Yang Gao,et al.  An efficient adaptive focused crawler based on ontology learning , 2005, Fifth International Conference on Hybrid Intelligent Systems (HIS'05).

[5]  G. Aghila,et al.  Ontology-based Web crawler , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[6]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[7]  Arputharaj Kannan,et al.  LSCrawler: A Framework for an Enhanced Focused Web Crawler Based on Link Semantics , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[8]  Filippo Menczer,et al.  Evaluating topic-driven web crawlers , 2001, SIGIR '01.

[9]  Alexander Pretschner,et al.  Ontology-based personalized search and browsing , 2003, Web Intell. Agent Syst..

[10]  Debajyoti Mukhopadhyay,et al.  A New Approach to Design Domain Specific Ontology Based Web Crawler , 2007, 10th International Conference on Information Technology (ICIT 2007).

[11]  Alexander Pretschner,et al.  Ontology-Based User Profiles for Search and Browsing , 2002 .

[12]  Debashis Hati,et al.  UDBFC: An effective focused crawling approach based on URL Distance calculation , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[13]  Filippo Menczer,et al.  Topical web crawlers: Evaluating adaptive algorithms , 2004, TOIT.

[14]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[15]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[16]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.