Ring: Real-Time Emerging Anomaly Monitoring System Over Text Streams

Microblog platforms have been extremely popular in the big data era due to its real-time diffusion of information. It's important to know what anomalous events are trending on the social network and be able to monitor their evolution and find related anomalies. In this paper we demonstrate <sc>Ring</sc>, a <underline>r</underline>eal-t<underline>i</underline>me emerging a<underline>n</underline>omaly monitorin<underline>g</underline> system over microblog text streams. <sc>Ring</sc> integrates our efforts on both emerging anomaly monitoring research and system research. From the anomaly monitoring perspective, <sc>Ring</sc> proposes a graph analytic approach such that (1) <sc>Ring</sc> is able to detect emerging anomalies at an earlier stage compared to the existing methods, (2) <sc>Ring</sc> is among the first to discover emerging anomalies correlations in a streaming fashion, (3) <sc>Ring</sc> is able to monitor anomaly evolutions in real-time at different time scales from minutes to months. From the system research perspective, <sc>Ring</sc> (1) optimizes time-ranged keyword query performance of a full-text search engine to improve the efficiency of monitoring anomaly evolution, (2) improves the dynamic graph processing performance of Spark and implements our graph stream model on it, As a result, <sc>Ring</sc> is able to process big data to the entire Weibo or Twitter text stream with linear horizontal scalability. The system clearly presents its advantages over existing systems and methods from both the event monitoring perspective and the system perspective for the emerging event monitoring task.

[1]  Beng Chin Ooi,et al.  TI: an efficient indexing mechanism for real-time search on tweets , 2011, SIGMOD '11.

[2]  Wilfred Ng,et al.  Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs , 2014, Proc. VLDB Endow..

[3]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[4]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[5]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[6]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[7]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[8]  Zi Huang,et al.  Indexing Evolving Events from Tweet Streams , 2016, IEEE Transactions on Knowledge and Data Engineering.

[9]  Junjie Wu,et al.  How Many Zombies Around You? , 2013, 2013 IEEE 13th International Conference on Data Mining.

[10]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[11]  Divesh Srivastava,et al.  Dense subgraph maintenance under streaming edge weight updates for real-time story identification , 2012, The VLDB Journal.

[12]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[13]  Craig MacDonald,et al.  Scalable distributed event detection for Twitter , 2013, 2013 IEEE International Conference on Big Data.

[14]  Li Lin,et al.  An adaptive switching scheme for iterative computing in the cloud , 2014, Frontiers of Computer Science.

[15]  T. Vicsek,et al.  Uncovering the overlapping community structure of complex networks in nature and society , 2005, Nature.

[16]  Ziyou Gao,et al.  Quantifying Information Flow During Emergencies , 2014, Scientific Reports.

[17]  Lin Ma,et al.  PAGE: a partition aware graph computation engine , 2013, CIKM.

[18]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[19]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[20]  Gerlof Bouma,et al.  Normalized (pointwise) mutual information in collocation extraction , 2009 .

[21]  Ke Wang,et al.  TopicSketch: Real-Time Bursty Topic Detection from Twitter , 2013, 2013 IEEE 13th International Conference on Data Mining.

[22]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[23]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[24]  Jianxin Li,et al.  LiveIndex: A Distributed Online Index System for Temporal Microblog Data , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[25]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[26]  Zhoujun Li,et al.  Emerging topic detection for organizations from microblogs , 2013, SIGIR.

[27]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[28]  Pei Lee CAST : A Context-Aware Story-Teller for Streaming Social Content , 2014 .

[29]  Fergal Reid,et al.  Percolation Computation in Complex Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[30]  Hans-Peter Kriegel,et al.  SigniTrend: scalable detection of emerging topics in textual streams by hashed significance thresholds , 2014, KDD.

[31]  Matthew Hurst,et al.  Event Detection and Tracking in Social Streams , 2009, ICWSM.

[32]  Eduard H. Hovy,et al.  Structured Event Retrieval over Microblog Archives , 2012, NAACL.

[33]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[34]  Divyakant Agrawal,et al.  GeoScope: Online Detection of Geo-Correlated Information Trends in Social Networks , 2013, Proc. VLDB Endow..

[35]  Charu C. Aggarwal,et al.  On Anomalous Hotspot Discovery in Graph Streams , 2013, 2013 IEEE 13th International Conference on Data Mining.

[36]  Chen Lin,et al.  CLEar: A Real-time Online Observatory for Bursty and Viral Events , 2014, Proc. VLDB Endow..

[37]  Jianxin Li,et al.  iGraph: an incremental data processing system for dynamic graph , 2016, Frontiers of Computer Science.

[38]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[39]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[40]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[41]  Nancy M. Amato,et al.  Multithreaded Asynchronous Graph Traversal for In-Memory and Semi-External Memory , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[42]  Jianxin Li,et al.  Bursty event detection from microblog: a distributed and incremental approach , 2016, Concurr. Comput. Pract. Exp..

[43]  Louiqa Raschid,et al.  A Graph Analytical Approach for Topic Detection , 2013, TOIT.

[44]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[45]  Jimmy J. Lin,et al.  Earlybird: Real-Time Search at Twitter , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[46]  Jianxin Li,et al.  Discovering Event Evolution Chain in Microblog , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[47]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[48]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[49]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[50]  Bora Uçar,et al.  On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe , 2010, SIAM J. Sci. Comput..

[51]  Haofen Wang,et al.  Towards Effective Event Detection, Tracking and Summarization on Microblog Data , 2011, WAIM.

[52]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[53]  Heng Ji,et al.  Constructing Topical Hierarchies in Heterogeneous Information Networks , 2013, ICDM.

[54]  Laks V. S. Lakshmanan,et al.  Incremental cluster evolution tracking from highly dynamic network data , 2014, 2014 IEEE 30th International Conference on Data Engineering.

[55]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[56]  Lin Ma,et al.  PAGE: A Partition Aware Engine for Parallel Graph Computation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[57]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[58]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.