Anomaly Detection in the Dynamics of Web and Social Networks Using Associative Memory

In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm. To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets.

[1]  Sanjay Chawla,et al.  Spatio-temporal Outlier Detection in Precipitation Data , 2008, KDD Workshop on Knowledge Discovery from Sensor Data.

[2]  Steve Harenberg,et al.  Anomaly detection in dynamic networks: a survey , 2015 .

[3]  Michela Ferron Collective Memories in Wikipedia , 2012 .

[4]  Vipin Kumar,et al.  Multiple Hypothesis Object Tracking For Unsupervised Self-Learning: An Ocean Eddy Tracking Application , 2013, AAAI.

[5]  Pierre Vandergheynst,et al.  Principal Patterns on Graphs: Discovering Coherent Structures in Datasets , 2015, IEEE Transactions on Signal and Information Processing over Networks.

[6]  Wendy Hall,et al.  Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach , 2016, WWW.

[7]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[8]  A. Karpatne,et al.  Spatio-Temporal Data Mining: A Survey of Problems and Methods , 2017, ArXiv.

[9]  Kirell Maël Benzi From recommender systems to spatio-temporal dynamics with network science , 2017 .

[10]  P. Massa,et al.  The Arab Spring| WikiRevolutions: Wikipedia as a Lens for Studying the Real-Time Formation of Collective Memories of Revolutions , 2011 .

[11]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[12]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[13]  Charu C. Aggarwal,et al.  On Anomalous Hotspot Discovery in Graph Streams , 2013, 2013 IEEE 13th International Conference on Data Mining.

[14]  Chang-Tien Lu,et al.  Detecting and tracking regional outliers in meteorological data , 2007, Inf. Sci..

[15]  Xavier Bresson,et al.  Transient networks of spatio-temporal connectivity map communication pathways in brain functional systems , 2017, NeuroImage.

[16]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[17]  Heng Wang,et al.  Locality Statistics for Anomaly Detection in Time Series of Graphs , 2013, IEEE Transactions on Signal Processing.

[18]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[19]  Bing Zhang,et al.  A Review of Remote Sensing Image Classification Techniques: the Role of Spatio-contextual Information , 2014 .

[20]  Dimitrios Gunopulos,et al.  STEM: a spatio-temporal miner for bursty activity , 2013, SIGMOD '13.

[21]  M. Jacomy,et al.  ForceAtlas2, a Continuous Graph Layout Algorithm for Handy Network Visualization Designed for the Gephi Software , 2014, PloS one.

[22]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[23]  Anders Mollgaard,et al.  The memory remains: Understanding collective memory in the digital age , 2016, Science Advances.

[24]  Claudia Niederée,et al.  What triggers human remembering of events? A large-scale analysis of catalysts for collective memory in Wikipedia , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[25]  Pierre Vandergheynst,et al.  Wikipedia graph mining: dynamic structure of collective memory , 2017 .

[26]  Santosh S. Venkatesh,et al.  The capacity of the Hopfield associative memory , 1987, IEEE Trans. Inf. Theory.

[27]  L. Kratz,et al.  Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Ambuj K. Singh,et al.  Mining Evolving Network Processes , 2013, 2013 IEEE 13th International Conference on Data Mining.

[29]  Charu C. Aggarwal,et al.  Evolutionary Network Analysis , 2014, ACM Comput. Surv..

[30]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Albert-László Barabási,et al.  Untangling performance from success , 2015, EPJ Data Science.

[32]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[33]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[34]  Terrill L. Frantz,et al.  Communication Networks from the Enron Email Corpus “It's Always About the People. Enron is no Different” , 2005, Comput. Math. Organ. Theory.