Unsupervised Clickstream Clustering for User Behavior Analysis

Online services are increasingly dependent on user participation. Whether it's online social networks or crowdsourcing services, understanding user behavior is important yet challenging. In this paper, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users' click events), and visualize the detected behaviors in an intuitive manner. Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity). The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors. For evaluation, we present case studies on two large-scale clickstream traces (142 million events) from real social networks. Our system effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters. Also, our user study shows people can easily interpret identified behaviors using our visualization tool.

[1]  Kwan-Liu Ma,et al.  Visual cluster exploration of web clickstream data , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[2]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[3]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[4]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[5]  Lu Chen,et al.  A method for discovering clusters of e-commerce interest patterns using click-stream data , 2015, Electron. Commer. Res. Appl..

[6]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[7]  Hongan Wang,et al.  Visualization of large hierarchical data by circle packing , 2006, CHI.

[8]  BhattiNina,et al.  Integrating user-perceived quality into Web server design , 2000 .

[9]  Tovi Grossman,et al.  Patina: dynamic heatmaps for visualizing application usage , 2013, CHI.

[10]  Chris Kimble,et al.  UBB mining: finding unexpected browsing behaviour in clickstream data to improve a Web site's design , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[11]  Lin Lu,et al.  Mining Significant Usage Patterns from Clickstream Data , 2005, WEBKDD.

[12]  Aaron Halfaker,et al.  Using edit sessions to measure participation in wikipedia , 2013, CSCW.

[13]  Eelco Herder,et al.  Web page revisitation revisited: implications of a long-term click-stream study of browser usage , 2007, CHI.

[14]  Krishna P. Gummadi,et al.  The Many Shades of Anonymity: Characterizing Anonymous Social Media Content , 2021, ICWSM.

[15]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[16]  Mira Dontcheva,et al.  MatrixWave: Visual Comparison of Event Sequence Data , 2015, CHI.

[17]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[18]  Gang Wang,et al.  Northeastern University , 2021, IEEE Pulse.

[19]  Jie Li,et al.  Characterizing typical and atypical user sessions in clickstreams , 2008, WWW.

[20]  J. B. Kruskal,et al.  Icicle Plots: Better Displays for Hierarchical Clustering , 1983 .

[21]  Hans-Peter Kriegel,et al.  Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? , 2010, SSDBM.

[22]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[23]  Ben Y. Zhao,et al.  Whispers in the dark: analysis of an anonymous social network , 2014, Internet Measurement Conference.

[24]  Stefan Savage,et al.  An analysis of underground forums , 2011, IMC '11.

[25]  Gang Wang,et al.  Wisdom in the social crowd: an analysis of quora , 2013, WWW.

[26]  Susan T. Dumais,et al.  Large scale analysis of web revisitation patterns , 2008, CHI.

[27]  Aniket Kittur,et al.  Instrumenting the crowd: using implicit behavioral measures to predict task performance , 2011, UIST.

[28]  John Suler,et al.  The Bad Boys of Cyberspace: Deviant Behavior in a Multimedia Chat Community , 1998, Cyberpsychology Behav. Soc. Netw..

[29]  M. Tamer Özsu,et al.  A Web page prediction model based on click-stream tree representation of user behavior , 2003, KDD '03.

[30]  Jeffrey Heer,et al.  Separating the swarm: categorization methods for user sessions on the web , 2002, CHI.

[31]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[32]  J. Stasko,et al.  Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[33]  Zheng Zhou,et al.  Development and validation of an instrument to measure user perceived service quality of information presenting Web portals , 2005, Inf. Manag..

[34]  Rossano Schifanella,et al.  A Large-Scale Study of User Image Search Behavior on the Web , 2015, CHI.

[35]  Allan Kuchinsky,et al.  Integrating user-perceived quality into Web server design , 2000, Comput. Networks.