Detecting Anomalies in Sequential Data with Higher-order Networks

A major branch of anomaly detection methods rely on dynamic networks: raw sequential data is first converted to a series of networks, then critical change points are identified in the evolving network structure. However, existing approaches use the first-order network (FON) to represent the underlying raw data, which may lose important higher-order sequence patterns, making higher-order anomalies undetectable in subsequent analysis. By replacing FON with higher-order network (HONs), we show that existing anomaly detection algorithms can better capture higher-order anomalies that may otherwise be ignored. We show that the existing HON construction algorithm cannot be used for the anomaly detection task due to the extra parameters and poor scalability; we introduce a parameter-free algorithm that constructs HON in big data sets. Using a large-scale synthetic data set with 11 billion web clickstreams, we demonstrate how the proposed method can capture variable orders of anomalies. Using a real-world taxi trajectory data, we show how the proposed method amplifies higher-order anomaly signals. Finally, we provide complexity analysis and benchmarking to show how one can incorporating higher-order dependencies with a small overhead.

[1]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[2]  Dana Ron,et al.  Learning probabilistic automata with variable memory length , 1994, COLT '94.

[3]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[4]  Terran Lane,et al.  Hidden Markov Models for Human/Computer Interface Modeling , 1999 .

[5]  Barak A. Pearlmutter,et al.  Detecting intrusions using system calls: alternative data models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[6]  Christoph C. Michael,et al.  Two state-based approaches to program-based anomaly detection , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[7]  Horst Bunke,et al.  Detection of Abnormal Change in a Time Series of Graphs , 2002, J. Interconnect. Networks.

[8]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[9]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Eamonn J. Keogh,et al.  Finding the most unusual time series subsequence: algorithms and applications , 2006, Knowledge and Information Systems.

[11]  Sanjay Chawla,et al.  Mining for Outliers in Sequential Databases , 2006, SDM.

[12]  Brandon Pincombea,et al.  Anomaly Detection in Time Series of Graphs using ARMA Processes , 2007 .

[13]  Vipin Kumar,et al.  Comparative Evaluation of Anomaly Detection Techniques for Sequence Data , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[14]  Arvind Ramanathan,et al.  An Online Approach for Mining Collective Behaviors from Molecular Dynamics Simulations , 2009, RECOMB.

[15]  Christos Faloutsos,et al.  oddball: Spotting Anomalies in Weighted Graphs , 2010, PAKDD.

[16]  Vipin Kumar,et al.  Anomaly Detection for Discrete Sequences: A Survey , 2012, IEEE Transactions on Knowledge and Data Engineering.

[17]  Venkatesh Saligrama,et al.  Video anomaly detection based on local statistical aggregates , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[19]  Nitesh V. Chawla,et al.  Improving management of aquatic invasions by integrating shipping network, ecological, and environmental data: data mining for social good , 2014, KDD.

[20]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[21]  Martin Rosvall,et al.  Memory in network flows and its effects on spreading dynamics and community detection , 2013, Nature Communications.

[22]  Steve Harenberg,et al.  Anomaly detection in dynamic networks: a survey , 2015 .

[23]  Leto Peel,et al.  Detecting Change Points in the Large-Scale Structure of Evolving Networks , 2014, AAAI.

[24]  Nitesh V. Chawla,et al.  Representing higher-order dependencies in networks , 2015, Science Advances.

[25]  Nitesh V. Chawla,et al.  HoNVis: Visualizing and exploring higher-order networks , 2017, 2017 IEEE Pacific Visualization Symposium (PacificVis).