Why we search: visualizing and predicting user behavior

The aggregation and comparison of behavioral patterns on the WWW represent a tremendous opportunity for understanding past behaviors and predicting future behaviors. In this paper, we take a first step at achieving this goal. We present a large scale study correlating the behaviors of Internet users on multiple systems ranging in size from 27 million queries to 14 million blog posts to 20,000 news articles. We formalize a model for events in these time-varying datasets and study their correlation. We have created an interface for analyzing the datasets, which includes a novel visual artifact, the DTWRadar, for summarizing differences between time series. Using our tool we identify a number of behavioral properties that allow us to understand the predictive power of patterns of use.

[1]  Jaime Teevan,et al.  History repeats itself: repeat queries in Yahoo's logs , 2006, SIGIR '06.

[2]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[3]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[4]  Susan T. Dumais,et al.  Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.

[5]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[6]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[7]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[8]  Konstantina Martzoukou,et al.  A review of Web information seeking research: considerations of method and foci of interest , 2005, Inf. Res..

[9]  Jon Kleinberg,et al.  Traffic-based feedback on the web , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[11]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[12]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[13]  Jimmy Lin,et al.  Identification of user sessions with hierarchical agglomerative clustering , 2006, ASIST.

[14]  Ramanathan V. Guha,et al.  The predictive power of online chatter , 2005, KDD '05.

[15]  Marc Alexa,et al.  Visualizing time-series on spirals , 2001, IEEE Symposium on Information Visualization, 2001. INFOVIS 2001..

[16]  Eamonn J. Keogh,et al.  Visualizing and Discovering Non-Trivial Patterns in Large Time Series Databases , 2005, Inf. Vis..

[17]  L. R. Rabiner,et al.  A comparative study of several dynamic time-warping algorithms for connected-word recognition , 1981, The Bell System Technical Journal.

[18]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[19]  Andrew P. Witkin,et al.  Scale-Space Filtering , 1983, IJCAI.

[20]  E. Tufte Beautiful Evidence , 2006 .

[21]  Steve Chien,et al.  Semantic similarity between search engine queries using temporal correlation , 2005, WWW '05.

[22]  Jarke J. van Wijk,et al.  Cluster and Calendar Based Visualization of Time Series Data , 1999, INFOVIS.

[23]  David D. Jensen,et al.  Mining of Concurrent Text and Time Series , 2008 .