论文信息 - Identifying hotspots on the real-time web

Identifying hotspots on the real-time web

We study the problem of automatically identifying ``hotspots'' on the real-time web. Concretely, we propose to identify highly-dynamic ad-hoc collections of users -- what we refer to as crowds -- in massive social messaging systems like Twitter and Facebook. The proposed approach relies on a message-based communication clustering approach over time-evolving graphs that captures the natural conversational nature of social messaging systems. One of the salient features of the proposed approach is an efficient locality-based clustering approach for identifying crowds of users in near real-time compared to more heavyweight static clustering algorithms. Based on a three month snapshot of Twitter consisting of 711,612 users and 61.3 million messages, we show how the proposed approach can efficiently and effectively identify Twitter-based crowds relative to static graph clustering techniques at a fraction of the computational cost.

James Caverlee | Krishna Yeswanth Kamath

[1] Robert E. Tarjan,et al. Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[2] Stijn van Dongen,et al. Graph Clustering Via a Discrete Uncoupling Process , 2008, SIAM J. Matrix Anal. Appl..

[3] T. C. Hu,et al. Multi-Terminal Network Flows , 1961 .

[4] Inderjit S. Dhillon,et al. A fast kernel-based multilevel algorithm for graph clustering , 2005, KDD '05.

[5] Pabitra Mitra,et al. Dynamic Algorithm for Graph Clustering Using Minimum Cut Tree , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).