URL ordering based performance evaluation of Web crawler

There are billions of Web pages on World Wide Web which can be accessed via internet. All of us rely on usage of internet for source of information. This source of information is available on web in various forms such as Websites, databases, images, sound, videos and many more. The search results given by search engine are classified on basis of many techniques such as keyword matches, link analysis, or many other techniques. Search engines provide information gathered from their own indexed databases. These indexed databases contain downloaded information from web pages. Whenever a query is provided by user, the information is fetched from these indexed pages. The Web Crawler is used to download and store web pages. Web crawler of these search engines is expert in crawling various Web pages to gather huge source of information. Web Crawler is developed which orders URLs on the basis of their content similarity to a query and structural similarity. Results are provided over five parameters: Top URLs, Precision, Content, Structural and Total Similarity for a keyword.

[1]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[2]  Hector Garcia-Molina,et al.  Efficient Crawling Through URL Ordering , 1998, Comput. Networks.

[3]  Lise Getoor,et al.  Link mining: a new data mining challenge , 2003, SKDD.

[4]  K. S. Kim,et al.  Design and implementation of web crawler based on dynamic web collection cycle , 2012, The International Conference on Information Network 2012.

[5]  Giuseppe Sansonetti,et al.  Social semantic query expansion , 2013, ACM Trans. Intell. Syst. Technol..

[6]  Jian Pei,et al.  Mining search and browse logs for web search , 2013, ACM Trans. Intell. Syst. Technol..

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[9]  C. Lee Giles,et al.  The Ethicality of Web Crawlers , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[10]  Carolyn McGregor,et al.  A cloud computing framework for real-time rural and remote service of critical care , 2011, 2011 24th International Symposium on Computer-Based Medical Systems (CBMS).

[11]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[12]  Frank Klawonn,et al.  What Is Fuzzy about Fuzzy Clustering? Understanding and Improving the Concept of the Fuzzifier , 2003, IDA.

[13]  Yuguang Fang,et al.  Cross-Domain Data Sharing in Distributed Electronic Health Record Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.

[14]  Lay-Ki Soon,et al.  An empirical study on harmonizing classification precision using IE patterns , 2010, The 2nd International Conference on Software Engineering and Data Mining.

[15]  Ling Liu,et al.  Security Models and Requirements for Healthcare Application Clouds , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[16]  Manuj Darbari,et al.  Information granules for medical infonomics , 2013 .

[17]  Shaojie Qiao,et al.  SimRank: A Page Rank approach based on similarity measure , 2010, 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering.

[18]  Sandeep Chatterjee,et al.  Developing Enterprise Web Services: An Architect's Guide , 2003 .

[19]  Prasenjit Mitra,et al.  Clustering-based incremental web crawling , 2010, TOIS.

[20]  Ravinder Kumar,et al.  Improving Efficiency of Web Crawler Algorithm Using Parametric Variations , 2010 .