Study on the Click Context of Web Search Users for Reliability Analysis

User behavior information analysis has been shown important for optimization and evaluation of Web search and has become one of the major areas in both information retrieval and knowledge management researches. This paper focuses on users' searching behavior reliability study based on large scale query and click-through logs collected from commercial search engines. The concept of reliability is defined in a probabilistic notion. The context of user click behavior on search results is analyzed in terms of relevance. Five features, namely query number, click entropy, first click ratio, last click ratio, and rank position, are proposed and studied to separate reliable user clicks from the others. Experimental results show that the proposed method evaluates the reliability of user behavior effectively. The AUC value of the ROC curve is 0.792, and the algorithm maintains 92.8% relevant clicks when filtering out 40% low quality clicks.

[1]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[2]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[3]  Qiang Wu,et al.  Improving web spam classification using rank-time features , 2007, AIRWeb '07.

[4]  Jie Li,et al.  Characterizing typical and atypical user sessions in clickstreams , 2008, WWW.

[5]  Pang-Ning Tan,et al.  Modeling of Web Robot Navigational Patterns , 2000 .

[6]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[7]  Thorsten Joachims,et al.  Web Watcher: A Tour Guide for the World Wide Web , 1997, IJCAI.

[8]  Ariel Fuxman,et al.  Using the wisdom of the crowds for keyword generation , 2008, WWW.

[9]  Jiayan Mi,et al.  Entropic Anxiety and the Allegory of Disappearance , 2007 .

[10]  Liu Yiqun,et al.  Automatic Search Engine Evaluation Based On User Behavior Analysis , 2007 .

[11]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[12]  Alan Halverson,et al.  Generating labels from clicks , 2009, WSDM '09.

[13]  Anja Feldmann,et al.  Web search clickstreams , 2006, IMC '06.

[14]  Chao Liu,et al.  Efficient multiple-click models in web search , 2009, WSDM '09.

[15]  Yiqun Liu,et al.  Automatic Search Engine Performance Evaluation Based on User Behavior Analysis: Automatic Search Engine Performance Evaluation Based on User Behavior Analysis , 2009 .

[16]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[17]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[18]  Xiaojie Yuan,et al.  Are click-through data adequate for learning web search rankings? , 2008, CIKM '08.

[19]  Ricardo A. Baeza-Yates,et al.  Modeling user search behavior , 2005, Third Latin American Web Congress (LA-WEB'2005).

[20]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[21]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[22]  Liu Yiqun,et al.  Research in Search Engine User Behavior Based on Log Analysis , 2004 .

[23]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.