Autoencoders and Generative Adversarial Networks for Anomaly Detection for Sequences

We introduce synthetic oversampling in anomaly detection for multi-feature sequence datasets based on autoencoders and generative adversarial networks. The first approach considers the use of an autoencoder in conjunction with standard oversampling methods to generate synthetic data that captures the sequential nature of the data. A different model uses generative adversarial networks to generate structure preserving synthetic data for the minority class. We also use generative adversarial networks on the majority class as an outlier detection method for novelty detection. We show that the use of generative adversarial network based synthetic data improves classification model performance on a variety of sequence data sets.

[1]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[2]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[3]  Erik Marchi,et al.  A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[5]  Yiming Yang,et al.  Kernel Change-point Detection with Auxiliary Deep Generative Models , 2019, ICLR.

[6]  Bernhard Schölkopf,et al.  Support Vector Method for Novelty Detection , 1999, NIPS.

[7]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[8]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[9]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[10]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[11]  J. Ma,et al.  Time-series novelty detection using one-class support vector machines , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[12]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[13]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[14]  See-Kiong Ng,et al.  Integrated Oversampling for Imbalanced Time Series Classification , 2013, IEEE Transactions on Knowledge and Data Engineering.

[15]  Andreas Dengel,et al.  Detection of Anomalies in Large Scale Accounting Data using Deep Autoencoder Networks , 2017, ArXiv.

[16]  M. Shyu,et al.  A Novel Anomaly Detection Scheme Based on Principal Component Classifier , 2003 .

[17]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[18]  Erik Marchi,et al.  Deep Recurrent Neural Network-Based Autoencoders for Acoustic Novelty Detection , 2017, Comput. Intell. Neurosci..

[19]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[20]  Francesco Piazza,et al.  Acoustic novelty detection with adversarial autoencoders , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[21]  Tao Zhang,et al.  Generative adversarial network based novelty detection usingminimized reconstruction error , 2018, Frontiers of Information Technology & Electronic Engineering.

[22]  Marek Lubicz,et al.  Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients , 2014, Appl. Soft Comput..

[23]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[24]  See-Kiong Ng,et al.  SPO: Structure Preserving Oversampling for Imbalanced Time Series Classification , 2011, 2011 IEEE 11th International Conference on Data Mining.

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26]  Chuan Sheng Foo,et al.  Efficient GAN-Based Anomaly Detection , 2018, ArXiv.

[27]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[28]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[29]  Heiko Hoffmann,et al.  Kernel PCA for novelty detection , 2007, Pattern Recognit..

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.