User modeling in search logs via a nonparametric bayesian approach

Searchers' information needs are diverse and cover a broad range of topics; hence, it is important for search engines to accurately understand each individual user's search intents in order to provide optimal search results. Search log data, which records users' search behaviors when interacting with search engines, provides a valuable source of information about users' search intents. Therefore, properly characterizing the heterogeneity among the users' observed search behaviors is the key to accurately understanding their search intents and to further predicting their behaviors. In this work, we study the problem of user modeling in the search log data and propose a generative model, dpRank, within a non-parametric Bayesian framework. By postulating generative assumptions about a user's search behaviors, dpRank identifies each individual user's latent search interests and his/her distinct result preferences in a joint manner. Experimental results on a large-scale news search log data set validate the effectiveness of the proposed approach, which not only provides in-depth understanding of a user's search intents but also benefits a variety of personalized applications.

[1]  Timos K. Sellis,et al.  Learning to rank user intent , 2011, CIKM '11.

[2]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[3]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[4]  Ryen W. White,et al.  WWW 2007 / Track: Browsers and User Interfaces Session: Personalization Investigating Behavioral Variability in Web Search , 2022 .

[5]  Raya Fidel,et al.  Users' perception of the performance of a filtering system , 1997, SIGIR '97.

[6]  Daniel E. Rose,et al.  Understanding user goals in web search , 2004, WWW '04.

[7]  Benjamin Piwowarski,et al.  A user browsing model to predict search engine click data from past observations. , 2008, SIGIR '08.

[8]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[9]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[10]  Ryen W. White,et al.  Probabilistic models for personalizing web search , 2012, WSDM '12.

[11]  Clement T. Yu,et al.  Personalized web search by mapping user queries to categories , 2002, CIKM '02.

[12]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[13]  Yi Chang,et al.  Yahoo! Learning to Rank Challenge Overview , 2010, Yahoo! Learning to Rank Challenge.

[14]  Qiang Wu,et al.  Click-through prediction for news queries , 2009, SIGIR.

[15]  Yuchen Zhang,et al.  User-click modeling for understanding and predicting search-behavior , 2011, KDD.

[16]  Susan T. Dumais,et al.  Understanding temporal query dynamics , 2011, WSDM '11.

[17]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[18]  Fan Li,et al.  Ranking specialization for web search: a divide-and-conquer approach by using topical RankSVM , 2010, WWW '10.

[19]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[20]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[21]  Fabrizio Silvestri,et al.  Mining Query Logs: Turning Search Usage Data into Knowledge , 2010, Found. Trends Inf. Retr..

[22]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[23]  Olivier Chapelle,et al.  A dynamic bayesian network click model for web search ranking , 2009, WWW '09.

[24]  Susan T. Dumais,et al.  To personalize or not to personalize: modeling queries with variation in user intent , 2008, SIGIR '08.

[25]  Najafi Azadeh,et al.  REAL LIFE, REAL USERS AND REAL NEEDS: A STUDY AND ANALYSIS OF USER QUERIES ON THE WEB , 2008 .

[26]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[27]  Susan T. Dumais,et al.  Learning user interaction models for predicting web search result preferences , 2006, SIGIR.