User Session Identification Based on Strong Regularities in Inter-activity Time

Session identification is a common strategy used to develop metrics for web analytics and perform behavioral analyses of user-facing systems. Past work has argued that session identification strategies based on an inactivity threshold is inherently arbitrary or has advocated that thresholds be set at about 30 minutes. In this work, we demonstrate a strong regularity in the temporal rhythms of user initiated events across several different domains of online activity (incl. video gaming, search, page views and volunteer contributions). We describe a methodology for identifying clusters of user activity and argue that the regularity with which these activity clusters appear implies a good rule-of-thumb inactivity threshold of about 1 hour. We conclude with implications that these temporal rhythms may have for system design based on our observations and theories of goal-directed human activity.

[1]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Clay Spinuzzi,et al.  Context and consciousness: Activity theory and human-computer interaction , 1997 .

[3]  Robert E. Kraut,et al.  The identification of deviance and its impact on retention in a multiplayer game , 2014, CSCW.

[4]  Dror G. Feitelson,et al.  On extracting session data from activity logs , 2012, SYSTOR '12.

[5]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[6]  Ali A. Ghorbani,et al.  Improving the referrer-based Web log session reconstruction , 2004, Proceedings. Second Annual Conference on Communication Networks and Services Research, 2004..

[7]  Jürgen Kurths,et al.  Evidence for a bimodal distribution in human communication , 2010, Proceedings of the National Academy of Sciences.

[8]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[9]  Ryen W. White,et al.  Assessing the scenic route: measuring the value of search trails in web logs , 2010, SIGIR.

[10]  Aaron Halfaker,et al.  Using edit sessions to measure participation in wikipedia , 2013, CSCW.

[11]  Edith Schonberg,et al.  Analysis and Visualization of Metrics for Online Merchandising , 1999, WEBKDD.

[12]  Loren G. Terveen,et al.  The computational geowiki: what, why, and how , 2008, CSCW.

[13]  Katerina Goseva-Popstojanova,et al.  Empirical Characterization of Session–Based Workload and Reliability for Web Servers , 2006, Empirical Software Engineering.

[14]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[15]  Rosie Jones,et al.  Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs , 2008, CIKM '08.

[16]  Myra Spiliopoulou,et al.  The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis , 2002, WEBKDD.

[17]  Ryen W. White,et al.  Lessons from the journey: a query log analysis of within-session learning , 2014, WSDM.

[18]  Francesco Bonchi,et al.  Do you want to take notes?: identifying research missions in Yahoo! search pad , 2010, WWW '10.

[19]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[20]  José Luis Ortega,et al.  Differences between web sessions according to the origin of their visits , 2010, J. Informetrics.

[21]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[22]  Christos Faloutsos,et al.  Identifying Web Browsing Trends and Patterns , 2001, Computer.

[23]  David R. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[24]  E PitkowJames,et al.  Characterizing browsing strategies in the World-Wide Web , 1995 .