We study the problem of automatically identifying ``hotspots'' on the real-time web. Concretely, we propose to identify highly-dynamic ad-hoc collections of users -- what we refer to as crowds -- in massive social messaging systems like Twitter and Facebook. The proposed approach relies on a message-based communication clustering approach over time-evolving graphs that captures the natural conversational nature of social messaging systems. One of the salient features of the proposed approach is an efficient locality-based clustering approach for identifying crowds of users in near real-time compared to more heavyweight static clustering algorithms. Based on a three month snapshot of Twitter consisting of 711,612 users and 61.3 million messages, we show how the proposed approach can efficiently and effectively identify Twitter-based crowds relative to static graph clustering techniques at a fraction of the computational cost.
[1]
Robert E. Tarjan,et al.
Graph Clustering and Minimum Cut Trees
,
2004,
Internet Math..
[2]
Stijn van Dongen,et al.
Graph Clustering Via a Discrete Uncoupling Process
,
2008,
SIAM J. Matrix Anal. Appl..
[3]
T. C. Hu,et al.
Multi-Terminal Network Flows
,
1961
.
[4]
Inderjit S. Dhillon,et al.
A fast kernel-based multilevel algorithm for graph clustering
,
2005,
KDD '05.
[5]
Pabitra Mitra,et al.
Dynamic Algorithm for Graph Clustering Using Minimum Cut Tree
,
2006,
Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).