Identifying Buzzing Stories via Anomalous Temporal Subgraph Discovery

Story identification from online user-generated content has recently raised increasing attention. Existing approaches fall into two categories. Approaches in the first category extract stories as cohesive substructures in a graph representing the strength of association between terms. The latter category includes approaches that analyze the temporal evolution of individual terms and identify stories by grouping terms with similar anomalous temporal behavior. Both categories have limitations. In this work we advance the literature on story identification by devising a novel method that profitably combines the peculiarities of the two main existing approaches, thus also addressing their weaknesses. Experiments on a dataset extracted from a real-world web-search log demonstrate the superiority of the proposed method over the state of the art.

[1]  Aristides Gionis,et al.  The community-search problem and how to plan a successful cocktail party , 2010, KDD.

[2]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[3]  Frank Wm. Tompa,et al.  Seeking Stable Clusters in the Blogosphere , 2007, VLDB.

[4]  Michelangelo Ceci,et al.  Mining Temporal Evolution of Entities in a Stream of Textual Documents , 2014, ISMIS.

[5]  Charu C. Aggarwal,et al.  A Survey of Algorithms for Dense Subgraph Discovery , 2010, Managing and Mining Graph Data.

[6]  Paola Velardi,et al.  Efficient temporal mining of micro-blog texts and its application to event discovery , 2015, Data Mining and Knowledge Discovery.

[7]  Ambuj K. Singh,et al.  As Strong as the Weakest Link: Mining Diverse Cliques in Weighted Graphs , 2013, ECML/PKDD.

[8]  Hui Xiong,et al.  Detecting and Tracking Topics and Events from Web Search Logs , 2012, TOIS.

[9]  Divesh Srivastava,et al.  Efficient identification of coupled entities in document collections , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[10]  Vladimir Batagelj,et al.  Fast algorithms for determining (generalized) core groups in social networks , 2011, Adv. Data Anal. Classif..

[11]  Cong Yu,et al.  Dynamic relationship and event discovery , 2011, WSDM '11.

[12]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[13]  Pei Lee CAST : A Context-Aware Story-Teller for Streaming Social Content , 2014 .

[14]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[15]  Tie-Yan Liu,et al.  Event detection from evolution of click-through data , 2006, KDD '06.

[16]  Hejun Wu,et al.  Core decomposition in large temporal graphs , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[17]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[18]  Charu C. Aggarwal,et al.  Outlier Detection for Temporal Data: A Survey , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Andrew V. Goldberg,et al.  Finding a Maximum Density Subgraph , 1984 .

[20]  Ambuj K. Singh,et al.  Mining Heavy Subgraphs in Time-Evolving Networks , 2011, 2011 IEEE 11th International Conference on Data Mining.

[21]  Laks V. S. Lakshmanan,et al.  Incremental cluster evolution tracking from highly dynamic network data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[22]  Moses Charikar,et al.  Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[23]  Paola Velardi,et al.  Time Makes Sense: Event Discovery in Twitter Using Temporal Similarity , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).