User Preference Prediction in Mobile Search

As search requests from mobile devices are growing very quickly, mobile search evaluation becomes one of the central concerns in mobile search studies. Beyond traditional Cranfield paradigm, side-by-side user preference between two ranked lists does not rely on user behavior assumptions and has been shown to produce more accurate results comparing to traditional evaluation methods based on “query-document” relevance. On the other hand, result list preference judgements have very high annotation cost. Previous studies attempted to assist human judges by automatically predicting preference. However, whether these models are effective in mobile search environment is still under investigation. In this paper, we proposed a machine learning model to predict user preference automatically in mobile search environment. We find that the relevance features can predict user preference very well, so we compare the agreement of evaluation metrics with side-by-side user preferences on our dataset. We get inspiration from the agreement comparison method and proposed new relevance features to build models. Experimental results show that our proposed model can predict user preference very effectively.

[1]  Olivier Chapelle,et al.  Expected reciprocal rank for graded relevance , 2009, CIKM.

[2]  Mark Sanderson,et al.  The relationship between IR effectiveness measures and user satisfaction , 2007, SIGIR.

[3]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[4]  Joemon M. Jose,et al.  Evaluating aggregated search pages , 2012, SIGIR '12.

[5]  David Hawking,et al.  Evaluation by comparing result sets in context , 2006, CIKM '06.

[6]  Yang Song,et al.  Exploring and exploiting user search behavior on mobile and tablet devices to improve search relevance , 2013, WWW '13.

[7]  Fan Zhang,et al.  Evaluating Mobile Search with Height-Biased Gain , 2017, SIGIR.

[8]  George Buchanan,et al.  Improving Web Interaction on Small Displays , 1999, Comput. Networks.

[9]  Mark Sanderson,et al.  Do user preferences and evaluation measures line up? , 2010, SIGIR.

[10]  Andrew Turpin,et al.  Do batch and user evaluations give the same results? , 2000, SIGIR '00.

[11]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[12]  Imed Zitouni,et al.  Machine-Assisted Search Preference Evaluation , 2014, CIKM.

[13]  Thorsten Joachims,et al.  Evaluating Retrieval Performance Using Clickthrough Data , 2003, Text Mining.

[14]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[15]  Alistair Moffat,et al.  Users versus models: what observation tells us about effectiveness metrics , 2013, CIKM.

[16]  Charles L. A. Clarke,et al.  Time-based calibration of effectiveness measures , 2012, SIGIR '12.

[17]  Eugene Agichtein,et al.  Mining touch interaction data on mobile devices to predict web search result relevance , 2013, SIGIR.

[18]  Michael Keen,et al.  ASLIB CRANFIELD RESEARCH PROJECT FACTORS DETERMINING THE PERFORMANCE OF INDEXING SYSTEMS VOLUME 2 , 1966 .