Characterizing typical and atypical user sessions in clickstreams

Millions of users retrieve information from the Internet using search engines. Mining these user sessions can provide valuable information about the quality of user experience and the perceived quality of search results. Often search engines rely on accurate estimates of Click Through Rate (CTR) to evaluate the quality of user experience. The vast heterogeneity in the user population and presence of automated software programs (bots) can result in high variance in the estimates of CTR. To improve the estimation accuracy of user experience metrics like CTR, we argue that it is important to identify typical and atypical user sessions in clickstreams. Our approach to identify these sessions is based on detecting outliers using Mahalanobis distance in the user session space. Our user session model incorporates several key clickstream characteristics including a novel conformance score obtained by Markov Chain analysis. Editorial results show that our approach of identifying typical and atypical sessions has a precision of about 89%. Filtering out these atypical sessions reduces the uncertainty (95% confidence interval) of the mean CTR by about 40%. These results demonstrate that our approach of identifying typical and atypical user sessions is extremely valuable for cleaning "noisy" user session data for increased accuracy in evaluating user experience.

[1]  A. Stassopoulou,et al.  Crawler Detection: A Bayesian Approach , 2006, International Conference on Internet Surveillance and Protection (ICISP’06).

[2]  Christoph Hölscher,et al.  Web search behavior of Internet experts and newbies , 2000, Comput. Networks.

[3]  Chris Kimble,et al.  UBB mining: finding unexpected browsing behaviour in clickstream data to improve a Web site's design , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[4]  Pang-Ning Tan,et al.  Modeling of Web Robot Navigational Patterns , 2000 .

[5]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[6]  Ricardo A. Baeza-Yates,et al.  Modeling user search behavior , 2005, Third Latin American Web Congress (LA-WEB'2005).

[7]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[8]  Vipin Kumar,et al.  Discovery of Web Robot Sessions Based on their Navigational Patterns , 2004, Data Mining and Knowledge Discovery.

[9]  Mark Levene,et al.  Data Mining of User Navigation Patterns , 1999, WEBKDD.

[10]  B. C. Walsh,et al.  Online text retrieval via browsing , 1988, Inf. Process. Manag..

[11]  Kamal Ali,et al.  Robust methodologies for modeling web click distributions , 2007, WWW '07.

[12]  Kannan Srinivasan,et al.  Modeling Online Browsing and Path Analysis Using Clickstream Data , 2004 .

[13]  Thorsten Joachims,et al.  Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[14]  Eelco Herder,et al.  Data Cleaning Methods for Client and Proxy Logs , 2006 .

[15]  Virgílio A. F. Almeida,et al.  In search of invariants for e-business workloads , 2000, EC '00.

[16]  Anja Feldmann,et al.  Web search clickstreams , 2006, IMC '06.

[17]  Hendrik Blockeel,et al.  Web mining research: a survey , 2000, SKDD.

[18]  Marios D. Dikaiakos,et al.  An investigation of web crawler behavior: characterization and metrics , 2005, Comput. Commun..

[19]  Virgílio A. F. Almeida,et al.  Analyzing robot behavior in e-business sites , 2001, SIGMETRICS '01.

[20]  Ramesh R. Sarukkai,et al.  Link prediction and path analysis using Markov chains , 2000, Comput. Networks.

[21]  Chris Kimble,et al.  Combining ethnographic and clickstream data to identify user Web browsing strategies , 2006, Inf. Res..

[22]  Myra Spiliopoulou,et al.  Web Usage Analysis and User Profiling , 2002, Lecture Notes in Computer Science.

[23]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .