MSTREAM: Fast Anomaly Detection in Multi-Aspect Streams

Given a stream of entries in a multi-aspect data setting i.e., entries having multiple dimensions, how can we detect anomalous activities in an unsupervised manner? For example, in the intrusion detection setting, existing work seeks to detect anomalous events or edges in dynamic graph streams, but this does not allow us to take into account additional attributes of each entry. Our work aims to define a streaming multi-aspect data anomaly detection framework, termed MSTREAM which can detect unusual group anomalies as they occur, in a dynamic manner. MSTREAM has the following properties: (a) it detects anomalies in multi-aspect data including both categorical and numeric attributes; (b) it is online, thus processing each record in constant time and constant memory; (c) it can capture the correlation between multiple aspects of the data. MSTREAM is evaluated over the KDDCUP99, CICIDS-DoS, UNSW-NB 15 and CICIDS-DDoS datasets, and outperforms state-of-the-art baselines.

[1]  Hadi Fanaee-T,et al.  Tensor-based anomaly detection: An interdisciplinary survey , 2016, Knowl. Based Syst..

[2]  Ambuj K. Singh,et al.  SigSpot: mining significant anomalous regions from time-evolving networks (abstract only) , 2012, SIGMOD Conference.

[3]  Ridha Hamila,et al.  Important Complexity Reduction of Random Forest in Multi-Classification Problem , 2019, 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC).

[4]  Smitha Rajagopal,et al.  A Stacking Ensemble for Network Intrusion Detection Using Heterogeneous Datasets , 2020, Secur. Commun. Networks.

[5]  Christos Faloutsos,et al.  EdgeCentric: Anomaly Detection in Edge-Attributed Networks , 2015, 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW).

[6]  Robin Lamarche-Perrin,et al.  Outlier detection in IP traffic modelled as a link stream using the stability of degree distributions over time , 2019, Comput. Networks.

[7]  Leman Akoglu,et al.  Scalable Anomaly Ranking of Attributed Neighborhoods , 2016, SDM.

[8]  Jingrui He,et al.  Anomaly Internet Network Traffic Detection by Kernel Principle Component Classifier , 2005, ISNN.

[9]  Wenjian Yu,et al.  EigenPulse: Detecting Surges in Large Streaming Graphs with Row Augmentation , 2019, PAKDD.

[10]  Jimeng Sun,et al.  Beyond streams and graphs: dynamic tensor analysis , 2006, KDD '06.

[11]  Farrukh Aslam Khan,et al.  TSDL: A Two-Stage Deep Learning Model for Efficient Network Intrusion Detection , 2019, IEEE Access.

[12]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[13]  Philip S. Yu,et al.  Graph stream classification using labeled and unlabeled graphs , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[14]  Lei Shi,et al.  STenSr: Spatio-temporal tensor streams for anomaly detection and pattern discovery , 2015, Knowledge and Information Systems.

[15]  Smitha Rajagopal,et al.  Feature Relevance Analysis and Feature Reduction of UNSW NB-15 Using Neural Networks on MAMLS , 2020 .

[16]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[17]  Artur Ziviani,et al.  Network anomaly detection using nonextensive entropy , 2007, IEEE Communications Letters.

[18]  Leman Akoglu,et al.  xStream: Outlier Detection in Feature-Evolving Data Streams , 2018, KDD.

[19]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[20]  Christos Faloutsos,et al.  MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams , 2019, AAAI.

[21]  Christos Faloutsos,et al.  MalSpot: Multi2 Malicious Network Behavior Patterns Analysis , 2014, PAKDD.

[22]  Francesco Bonchi,et al.  The importance of unexpectedness: Discovering buzzing stories in anomalous temporal graphs , 2019, Web Intell..

[23]  Francesco Bonchi,et al.  Identifying Buzzing Stories via Anomalous Temporal Subgraph Discovery , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[24]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[25]  James Bailey,et al.  Accelerating Online CP Decompositions for Higher Order Tensors , 2016, KDD.

[26]  Christos Faloutsos,et al.  Spotting misbehaviors in location-based social networks using tensors , 2014, WWW.

[27]  Hwanjo Yu,et al.  DILOF: Effective and Memory Efficient Local Outlier Detection in Data Streams , 2018, KDD.

[28]  Junghyun Namkung,et al.  PUMAD: PU Metric learning for anomaly detection , 2020, Inf. Sci..

[29]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[30]  Christos Faloutsos,et al.  A General Suspiciousness Metric for Dense Blocks in Multimodal Data , 2015, 2015 IEEE International Conference on Data Mining.

[31]  Christos Faloutsos,et al.  M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees , 2016, ECML/PKDD.

[32]  Chen Luo,et al.  Arrays of (locality-sensitive) Count Estimators (ACE): Anomaly Detection on the Edge , 2018, WWW.

[33]  Bo Zong,et al.  Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection , 2018, ICLR.

[34]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[35]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[36]  Sudipto Guha,et al.  Robust Random Cut Forest Based Anomaly Detection on Streams , 2016, ICML.

[37]  Yu Cheng,et al.  Deep Structured Energy Based Models for Anomaly Detection , 2016, ICML.

[38]  Leman Akoglu,et al.  Discovering Communities and Anomalies in Attributed Graphs , 2018, ACM Trans. Knowl. Discov. Data.

[39]  Andreas Hotho,et al.  A Survey of Network-based Intrusion Detection Data Sets , 2019, Comput. Secur..

[40]  Yoshua Bengio,et al.  Maximum Entropy Generators for Energy-Based Models , 2019, ArXiv.

[41]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[42]  Xinbo Gao,et al.  Robust tensor subspace learning for anomaly detection , 2011, Int. J. Mach. Learn. Cybern..

[43]  Nikos D. Sidiropoulos,et al.  ParCube: Sparse Parallelizable Tensor Decompositions , 2012, ECML/PKDD.

[44]  Ambuj K. Singh,et al.  NetSpot: Spotting Significant Anomalous Regions on Dynamic Networks , 2013, SDM.

[45]  Hwee Kuan Lee,et al.  Fence GAN: Towards Better Anomaly Detection , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[46]  Danai Koutra,et al.  Graph based anomaly detection and description: a survey , 2014, Data Mining and Knowledge Discovery.

[47]  Christos Faloutsos,et al.  Fast Robustness Estimation in Large Social Graphs: Communities and Anomaly Detection , 2012, SDM.

[48]  Witold Litwin,et al.  Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[49]  Bryan Perozzi,et al.  When Recommendation Goes Wrong: Anomalous Link Discovery in Recommendation Networks , 2016, KDD.

[50]  Chengqi Zhang,et al.  Graph Ensemble Boosting for Imbalanced Noisy Graph Stream Classification , 2015, IEEE Transactions on Cybernetics.

[51]  Steve Harenberg,et al.  A Scalable Approach for Outlier Detection in Edge Streams Using Sketch-based Approximations , 2016, SDM.

[52]  Charu C. Aggarwal,et al.  On Anomalous Hotspot Discovery in Graph Streams , 2013, 2013 IEEE 13th International Conference on Data Mining.

[53]  Christos Faloutsos,et al.  DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams , 2017, KDD.

[54]  Xue Li,et al.  Classifier Ensemble for Uncertain Data Stream Classification , 2010, PAKDD.

[55]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[56]  Xiangliang Zhang,et al.  Profiling program behavior for anomaly intrusion detection based on the transition and frequency property of computer audit data , 2006, Comput. Secur..

[57]  Nour Moustafa,et al.  UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set) , 2015, 2015 Military Communications and Information Systems Conference (MilCIS).

[58]  Bin Li,et al.  Hashing for Adaptive Real-Time Graph Stream Classification With Concept Drifts , 2018, IEEE Transactions on Cybernetics.

[59]  Christos Faloutsos,et al.  SedanSpot: Detecting Anomalies in Edge Streams , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[60]  David H. Wolpert,et al.  Nonlinear Information Bottleneck , 2017, Entropy.

[61]  Charu C. Aggarwal,et al.  Subspace Outlier Detection in Linear Time with Randomized Hashing , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[62]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[63]  Philip S. Yu,et al.  An ensemble-based approach to fast classification of multi-label data streams , 2011, 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[64]  Ali A. Ghorbani,et al.  Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization , 2018, ICISSP.

[65]  Hadi Fanaee-T,et al.  Multi-aspect-streaming tensor analysis , 2015, Knowl. Based Syst..

[66]  Thiago Silva Rezende,et al.  Detection of Anomalies in Large-Scale Cyberattacks Using Fuzzy Neural Networks , 2020, AI.

[67]  Christos Faloutsos,et al.  D-Cube: Dense-Block Detection in Terabyte-Scale Tensors , 2017, WSDM.

[68]  Ananthram Swami,et al.  Jaal: Towards Network Intrusion Detection at ISP Scale , 2017, CoNEXT.

[69]  Christos Faloutsos,et al.  MultiAspectForensics: Pattern Mining on Large-Scale Heterogeneous Networks with Tensor Analysis , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[70]  K. P. Soman,et al.  Deep Learning Approach for Intelligent Intrusion Detection System , 2019, IEEE Access.