Identifying hotspots on the real-time web

We study the problem of automatically identifying ``hotspots'' on the real-time web. Concretely, we propose to identify highly-dynamic ad-hoc collections of users -- what we refer to as crowds -- in massive social messaging systems like Twitter and Facebook. The proposed approach relies on a message-based communication clustering approach over time-evolving graphs that captures the natural conversational nature of social messaging systems. One of the salient features of the proposed approach is an efficient locality-based clustering approach for identifying crowds of users in near real-time compared to more heavyweight static clustering algorithms. Based on a three month snapshot of Twitter consisting of 711,612 users and 61.3 million messages, we show how the proposed approach can efficiently and effectively identify Twitter-based crowds relative to static graph clustering techniques at a fraction of the computational cost.

[1]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[2]  Stijn van Dongen,et al.  Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[3]  T. C. Hu,et al.  Multi-Terminal Network Flows , 1961 .

[4]  Inderjit S. Dhillon,et al.  A fast kernel-based multilevel algorithm for graph clustering , 2005, KDD '05.

[5]  Pabitra Mitra,et al.  Dynamic Algorithm for Graph Clustering Using Minimum Cut Tree , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).