Precision Evaluation of Search Engines

In this paper, we present a general approach for statistically evaluating precision of search engines on the Web. Search engines are evaluated in two steps based on a large number of sample queries: (a) computing relevance scores of hits from each search engine, and (b) ranking the search engines based on statistical comparison of the relevance scores. In computing relevance scores of hits, we study four relevance scoring algorithms. Three of them are variations of algorithms widely used in the traditional information retrieval field. They are cover density ranking, Okapi similarity measurement, and vector space model algorithms. In addition, we develop a new three-level scoring algorithm to mimic commonly used manual approaches. In ranking the search engines in terms of precision, we apply a statistical metric called probability of win. In our experiments, six popular search engines, AltaVista, Fast, Google, Go, iWon, and NorthernLight, were evaluated based on queries from two domains of interest: parallel and distributed processing, and knowledge and data engineering. The first query set contains 1726 queries collected from the index terms of papers published in the IEEE Transactions on Knowledge and Data Engineering. The second set contains 1383 queries collected from the index terms of papers published in the IEEE Transactions on Parallel and Distributed Systems. Search engines were queried and compared in two different search modes: the default search mode and the exact phrase search mode. Our experimental results show that these six search engines performed differently under different search modes and scoring methods. Overall, Google was the best. NorthernLight was mostly second in the default search mode, whereas iWon was mostly second in the exact phrase search mode.

[1]  Charles L. A. Clarke,et al.  Fast Automatic Passage Ranking (MultiText Experiments for TREC-8) , 1999, TREC.

[2]  Charles L. A. Clarke,et al.  Shortest Substring Ranking (MultiText Experiments for TREC-4) , 1995, TREC.

[3]  Longzhuang Li,et al.  A new statistical method for performance evaluation of search engines , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[4]  Jaideep Srivastava,et al.  First 20 precision among World Wide Web search services (search engines) , 1999 .

[5]  Longzhuang Li,et al.  A new method for automatic performance comparison of search engines , 2004, World Wide Web.

[6]  Peter Bailey,et al.  Overview of the TREC-8 Web Track , 2000, TREC.

[7]  Charles L. A. Clarke,et al.  Relevance ranking for one to three term queries , 1997, Inf. Process. Manag..

[8]  Xiaoying Dong,et al.  SEARCH ENGINES ON THE WORLD WIDE WEB AND INFORMATION RETRIEVAL FROM THE INTERNET: A REVIEW AND EVALUATION , 1997 .

[9]  Alfred Glossbrenner,et al.  Search engines for the World Wide Web , 1997 .

[10]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[11]  Peter Bailey,et al.  ACSys TREC-8 Experiments , 1999, TREC.

[12]  Amanda Spink,et al.  Real life information retrieval: a study of user queries on the Web , 1998, SIGF.

[13]  Donna K. Harman,et al.  Results and Challenges in Web Search Evaluation , 1999, Comput. Networks.

[14]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[15]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[16]  Benjamin W. Wah,et al.  Statistical Generalization of Performance-Related Heuristics for Knowledge-Lean Applications , 1996, Int. J. Artif. Intell. Tools.

[17]  Nicholas G. Tomaiuolo,et al.  An analysis of Internet search engines: assessment of over 200 search queries , 1996 .