Behavioral dynamics on the web: Learning, modeling, and prediction

The queries people issue to a search engine and the results clicked following a query change over time. For example, after the earthquake in Japan in March 2011, the query japan spiked in popularity and people issuing the query were more likely to click government-related results than they would prior to the earthquake. We explore the modeling and prediction of such temporal patterns in Web search behavior. We develop a temporal modeling framework adapted from physics and signal processing and harness it to predict temporal patterns in search behavior using smoothing, trends, periodicities, and surprises. Using current and past behavioral data, we develop a learning procedure that can be used to construct models of users' Web search activities. We also develop a novel methodology that learns to select the best prediction model from a family of predictive models for a given query or a class of queries. Experimental results indicate that the predictive models significantly outperform baseline models that weight historical evidence the same for all queries. We present two applications where new methods introduced for the temporal modeling of user behavior significantly improve upon the state of the art. Finally, we discuss opportunities for using models of temporal dynamics to enhance other areas of Web search and information retrieval.

[1]  D.P. Skinner,et al.  The cepstrum: A guide to processing , 1977, Proceedings of the IEEE.

[2]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[3]  Milad Shokouhi,et al.  Time-sensitive query auto-completion , 2012, SIGIR '12.

[4]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[5]  Surajit Chaudhuri,et al.  Extending autocompletion to tolerate errors , 2009, SIGMOD Conference.

[6]  Brian D. Davison,et al.  Learning to rank for freshness and relevance , 2011, SIGIR.

[7]  Guoliang Li,et al.  Efficient interactive fuzzy keyword search , 2009, WWW '09.

[8]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[9]  Susan T. Dumais,et al.  Leveraging temporal dynamics of document content in relevance ranking , 2010, WSDM '10.

[10]  I. Witten,et al.  The Reactive Keyboard: a predictive typing aid , 1990, Computer.

[11]  Susan T. Dumais,et al.  Personalizing Search via Automated Analysis of Interests and Activities , 2005, SIGIR.

[12]  Hao Wu,et al.  Suggesting Topic-Based Query Terms as You Type , 2010, 2010 12th International Asia-Pacific Web Conference.

[13]  Eric Horvitz,et al.  Patterns of search: analyzing and modeling Web query refinement , 1999 .

[14]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[15]  Qiang Wu,et al.  Click-through prediction for news queries , 2009, SIGIR.

[16]  Evgeniy Gabrilovich,et al.  A word at a time: computing word relatedness using temporal semantic analysis , 2011, WWW.

[17]  A. Noll Cepstrum pitch determination. , 1967, The Journal of the Acoustical Society of America.

[18]  Yossi Matias,et al.  On the Predictability of Search Trends , 2009 .

[19]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[20]  Melvin J. Hinich,et al.  Time Series Analysis by State Space Methods , 2001 .

[21]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[22]  HorvitzEric,et al.  Behavioral dynamics on the web , 2013 .

[23]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[24]  Susan T. Dumais,et al.  Classification-enhanced ranking , 2010, WWW '10.

[25]  Yisong Yue,et al.  Beyond position bias: examining result attractiveness as a source of presentation bias in clickthrough data , 2010, WWW '10.

[26]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[27]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[28]  Ziv Bar-Yossef,et al.  Context-sensitive query auto-completion , 2011, WWW.

[29]  Ian H. Witten,et al.  The Reactive Keyboard: A Predicive Typing Aid , 1990, Computer.

[30]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2008, IEEE Transactions on Knowledge and Data Engineering.

[31]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[32]  Susan T. Dumais,et al.  Understanding temporal query dynamics , 2011, WSDM '11.

[33]  Fernando Diaz,et al.  Temporal profiles of queries , 2007, TOIS.

[34]  Susan T. Dumais,et al.  Modeling and predicting behavioral dynamics on the web , 2012, WWW.

[35]  Ophir Frieder,et al.  Hourly analysis of a very large topically categorized web query log , 2004, SIGIR '04.

[36]  Gilad Mishne,et al.  Towards recency ranking in web search , 2010, WSDM '10.

[37]  Fuchun Peng,et al.  Improving search relevance for implicitly temporal queries , 2009, SIGIR.

[38]  Miles Efron,et al.  Estimation methods for ranking recent information , 2011, SIGIR.

[39]  Rob J Hyndman,et al.  Forecasting with Exponential Smoothing: The State Space Approach , 2008 .

[40]  Jan A Snyman,et al.  Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms , 2005 .

[41]  Miles Efron,et al.  Linear time series models for term weighting in information retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[42]  Tobias Scheffer,et al.  Sentence Completion , 1921, SIGIR '04.

[43]  Ryen W. White,et al.  Predicting short-term interests using activity-based search context , 2010, CIKM.

[44]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[45]  Peiling Wang,et al.  Mining longitudinal web queries: Trends and patterns , 2003, J. Assoc. Inf. Sci. Technol..

[46]  Peter Haider,et al.  Learning to Complete Sentences , 2005, ECML.

[47]  Milad Shokouhi,et al.  Detecting seasonal queries by time-series analysis , 2011, SIGIR.

[48]  Fernando Diaz,et al.  Integration of news content into web results , 2009, WSDM '09.

[49]  Brian N. Bershad,et al.  Why we search: visualizing and predicting user behavior , 2007, WWW '07.

[50]  Gary Marchionini,et al.  Examining the effectiveness of real-time query expansion , 2007, Inf. Process. Manag..

[51]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[52]  H. V. Jagadish,et al.  Effective Phrase Prediction , 2007, VLDB.

[53]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[54]  Shaul Markovitch,et al.  Similarity of Temporal Query Logs Based on ARIMA Model , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[55]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[56]  Fernando Diaz,et al.  Time is of the essence: improving recency ranking using Twitter data , 2010, WWW '10.

[57]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[58]  Rob J. Hyndman,et al.  Forecasting with Exponential Smoothing , 2008 .