Clickstream User Behavior Models

The next generation of Internet services is driven by users and user-generated content. The complex nature of user behavior makes it highly challenging to manage and secure online services. On one hand, service providers cannot effectively prevent attackers from creating large numbers of fake identities to disseminate unwanted content (e.g., spam). On the other hand, abusive behavior from real users also poses significant threats (e.g., cyberbullying). In this article, we propose clickstream models to characterize user behavior in large online services. By analyzing clickstream traces (i.e., sequences of click events from users), we seek to achieve two goals: (1) detection: to capture distinct user groups for the detection of malicious accounts, and (2) understanding: to extract semantic information from user groups to understand the captured behavior. To achieve these goals, we build two related systems. The first one is a semisupervised system to detect malicious user accounts (Sybils). The core idea is to build a clickstream similarity graph where each node is a user and an edge captures the similarity of two users’ clickstreams. Based on this graph, we propose a coloring scheme to identify groups of malicious accounts without relying on a large labeled dataset. We validate the system using ground-truth clickstream traces of 16,000 real and Sybil users from Renren, a large Chinese social network. The second system is an unsupervised system that aims to capture and understand the fine-grained user behavior. Instead of binary classification (malicious or benign), this model identifies the natural groups of user behavior and automatically extracts features to interpret their semantic meanings. Applying this system to Renren and another online social network, Whisper (100K users), we help service providers identify unexpected user behaviors and even predict users’ future actions. Both systems received positive feedback from our industrial collaborators including Renren, LinkedIn, and Whisper after testing on their internal clickstream data.

[1]  Erik W. Black,et al.  Anonymous social media - Understanding the content and context of Yik Yak , 2016, Comput. Hum. Behav..

[2]  Aziz Mohaisen,et al.  Measuring the mixing time of social graphs , 2010, IMC '10.

[3]  Anja Feldmann,et al.  Understanding online social network usage from a network perspective , 2009, IMC '09.

[4]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[5]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[6]  Seungyeop Han,et al.  Exploring Cyberbullying and Other Toxic Behavior in Team Competition Online Games , 2015, CHI.

[7]  Chris Kimble,et al.  UBB mining: finding unexpected browsing behaviour in clickstream data to improve a Web site's design , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[8]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[9]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[10]  Venkatesan Guruswami,et al.  CopyCatch: stopping group attacks by spotting lockstep behavior in social networks , 2013, WWW.

[11]  Aaron Halfaker,et al.  Using edit sessions to measure participation in wikipedia , 2013, CSCW.

[12]  Arjun Mukherjee,et al.  Spotting fake reviewer groups in consumer reviews , 2012, WWW.

[13]  Gang Wang,et al.  Serf and turf: crowdturfing for fun and profit , 2011, WWW.

[14]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[15]  Ben Y. Zhao,et al.  Uncovering social network Sybils in the wild , 2011, ACM Trans. Knowl. Discov. Data.

[16]  Christos Faloutsos,et al.  CatchSync: catching synchronized behavior in large directed graphs , 2014, KDD.

[17]  Jeffrey Heer,et al.  Mining the Structure of User Activity using Cluster Stability , 2002 .

[18]  Lin Lu,et al.  Mining Significant Usage Patterns from Clickstream Data , 2005, WEBKDD.

[19]  Mira Dontcheva,et al.  MatrixWave: Visual Comparison of Event Sequence Data , 2015, CHI.

[20]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[21]  Dawn Xiaodong Song,et al.  Suspended accounts in retrospect: an analysis of twitter spam , 2011, IMC '11.

[22]  Arindam Banerjee,et al.  Concept-based clustering of clickstream data , 2000 .

[23]  George Danezis,et al.  SybilInfer: Detecting Sybil Nodes using Social Networks , 2009, NDSS.

[24]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[25]  Rossano Schifanella,et al.  A Large-Scale Study of User Image Search Behavior on the Web , 2015, CHI.

[26]  Stefan Savage,et al.  Dirty Jobs: The Role of Freelance Labor in Web Service Abuse , 2011, USENIX Security Symposium.

[27]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[28]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[29]  Calton Pu,et al.  Reverse Social Engineering Attacks in Online Social Networks , 2011, DIMVA.

[30]  Vern Paxson,et al.  Adapting Social Spam Infrastructure for Political Censorship , 2012, LEET.

[31]  John R. Douceur,et al.  The Sybil Attack , 2002, IPTPS.

[32]  Feng Xiao,et al.  SybilLimit: A Near-Optimal Social Network Defense against Sybil Attacks , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[33]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[34]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[35]  Qiang Cao,et al.  Uncovering Large Groups of Active Malicious Accounts in Online Social Networks , 2014, CCS.

[36]  Shivakant Mishra,et al.  Analyzing Negative User Behavior in a Semi-anonymous Social Network , 2014, ArXiv.

[37]  Chris Kanich,et al.  Re: CAPTCHAs-Understanding CAPTCHA-Solving Services in an Economic Context , 2010, USENIX Security Symposium.

[38]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, IMC '10.

[39]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[40]  Lu Chen,et al.  A method for discovering clusters of e-commerce interest patterns using click-stream data , 2015, Electron. Commer. Res. Appl..

[41]  Lakshminarayanan Subramanian,et al.  Sybil-Resilient Online Content Voting , 2009, NSDI.

[42]  Arindam Banerjee,et al.  Clickstream clustering using weighted longest common subsequences , 2001 .

[43]  Jie Li,et al.  Characterizing typical and atypical user sessions in clickstreams , 2008, WWW.

[44]  John Suler,et al.  The Bad Boys of Cyberspace: Deviant Behavior in a Multimedia Chat Community , 1998, Cyberpsychology Behav. Soc. Netw..

[45]  M. Tamer Özsu,et al.  A Web page prediction model based on click-stream tree representation of user behavior , 2003, KDD '03.

[46]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[47]  Jeffrey Heer,et al.  Separating the swarm: categorization methods for user sessions on the web , 2002, CHI.

[48]  Michael Sirivianos,et al.  Aiding the Detection of Fake Accounts in Large Scale Social Online Services , 2012, NSDI.

[49]  Krishna P. Gummadi,et al.  An analysis of social network-based Sybil defenses , 2010, SIGCOMM '10.

[50]  Gang Wang,et al.  Social Turing Tests: Crowdsourcing Sybil Detection , 2012, NDSS.

[51]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[52]  Christos Faloutsos,et al.  Catching Synchronized Behaviors in Large Networks , 2016, ACM Trans. Knowl. Discov. Data.

[53]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[54]  Hongan Wang,et al.  Visualization of large hierarchical data by circle packing , 2006, CHI.

[55]  Michael Kaminsky,et al.  SybilGuard: Defending Against Sybil Attacks via Social Networks , 2008, IEEE/ACM Transactions on Networking.

[56]  Tovi Grossman,et al.  Patina: dynamic heatmaps for visualizing application usage , 2013, CHI.

[57]  Ben Y. Zhao,et al.  Understanding latent interactions in online social networks , 2010, TWEB.

[58]  Krishna P. Gummadi,et al.  The Many Shades of Anonymity: Characterizing Anonymous Social Media Content , 2021, ICWSM.

[59]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[60]  Kwan-Liu Ma,et al.  Visual cluster exploration of web clickstream data , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[61]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[62]  Ben Y. Zhao,et al.  Whispers in the dark: analysis of an anonymous social network , 2014, Internet Measurement Conference.

[63]  Stefan Savage,et al.  An analysis of underground forums , 2011, IMC '11.

[64]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[65]  Yi Li,et al.  In a World That Counts: Clustering and Detecting Fake Social Engagement at Scale , 2015, WWW.