Investigation of the accuracy of search engine hit counts

This study investigates the accuracy of search engine hit counts for search queries. We investigate the accuracy of hit counts for Google, Yahoo and Microsoft Live Search, and the accuracy of single and multiple term queries. In addition, we investigate the consistency of hit count estimates for 15 days. The results show that all three provide estimates for the number of matching documents and the estimation patterns of their counting algorithms differ greatly. The accuracy of hit counts for multiple word queries has not been studied before. The results of our study show that the number of words in queries affects the accuracy of estimations significantly. The percentages of accurate hit count estimations are reduced almost by half when going from single word to two word query tests in all three search engines. With the increase in the number of query words, the error in estimation increases and the number of accurate estimations decreases.

[1]  Mike Thelwall,et al.  Quantitative comparisons of search engine results , 2008, J. Assoc. Inf. Sci. Technol..

[2]  Judit Bar-Ilan,et al.  The use of web search engines in information science research , 2005, Annu. Rev. Inf. Sci. Technol..

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2003, WWW '03.

[5]  JUSTIN ZOBEL,et al.  Inverted files for text search engines , 2006, CSUR.

[6]  Frank van Harmelen,et al.  Using Google distance to weight approximate ontology matches , 2007, WWW '07.

[7]  Howard Rosenbaum,et al.  Can search engines be used as tools for web-link analysis? A critical view , 1999, J. Documentation.

[8]  Rudy Prabowo,et al.  Are raw RSS feeds suitable for broad issue scanning? A science concern case study , 2006, J. Assoc. Inf. Sci. Technol..

[9]  Andrei Z. Broder,et al.  Sampling Search-Engine Results , 2005, WWW '05.

[10]  Ronald Rousseau,et al.  Daily time series of common single word searches in AltaVista and NorthernLight , 1998 .

[11]  Mike Thelwall Extracting accurate and complete results from search engines: Case study windows live , 2008 .

[12]  Amanda Spink,et al.  From E-Sex to E-Commerce: Web Search Changes , 2002, Computer.

[13]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[14]  Mike Thelwall,et al.  Search engine coverage bias: evidence and possible causes , 2004, Inf. Process. Manag..

[15]  Torsten Suel,et al.  Performance of compressed inverted list caching in search engines , 2008, WWW.

[16]  Mike Thelwall,et al.  Scholarly Use of the Web: What Are the Key Inducers of Links to Journal Web Sites , 2003, J. Assoc. Inf. Sci. Technol..

[17]  Paul Nieuwenhuysen,et al.  Internet search engines - fluctuations in document accessibility , 2001, J. Documentation.

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Judit Bar-Ilan,et al.  Evolution, continuity, and disappearance of documents on a specific topic on the Web: A longitudinal study of informetrics , 2004, J. Assoc. Inf. Sci. Technol..

[20]  Antonio Gulli,et al.  The indexable web is more than 11.5 billion pages , 2005, WWW '05.

[21]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[22]  Oren Etzioni,et al.  Towards comprehensive web search , 1999 .

[23]  Judit Bar-Ilan Search engine results over time-a case study on search engine stability , 1998 .

[24]  José Luis Ortega,et al.  Scientific research activity and communication measured with cybermetrics indicators: Research Articles , 2006 .

[25]  José Luis Ortega,et al.  Scientific research activity and communication measured with cybermetrics indicators , 2006, J. Assoc. Inf. Sci. Technol..

[26]  Judit Bar-Ilan,et al.  How do search engines respond to some non-English queries? , 2005, J. Inf. Sci..