Diversity Aware-Based Sequential Ensemble Learning for Robust Anomaly Detection

Anomaly detection has become a popular topic in many domains because anomalies can provide valuable information. Recently, ensemble learning has been applied to improve the generalization ability of existing anomaly detection methods. In an anomaly ensemble framework, diversity is essential for building a powerful ensemble method to obtain impressive performance. However, most studies use heuristic techniques to improve the diversity of ensembles, and it generally leads to limited diversity. To obtain improved diversity, we propose a diversity aware-based sequential ensemble method (called D-SEM) for anomaly detection. Specifically, our proposed method divides the ensemble diversity into two parts: sample diversity and model diversity. For sample diversity, we introduce the subsampling technique to implement preliminary generation of diverse datasets for training. For model diversity, we design an ensemble-based optimization model to learn base classifiers with improved diversity. Furthermore, an unsupervised diversity measure is proposed to quantitatively assess diversity and an anomaly pruning strategy is utilized to successively eliminate pseudo-anomalies. Based on the inclusion of sample diversity and model diversity, the proposed D-SEM method obtains better generalization ability for anomaly detection. The experimental results based on real-world datasets suggest that the proposed method has superior performance compared with various state-of-the-art methods.

[1]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[2]  Charles W. Therrien,et al.  Probability and Random Processes for Electrical and Computer Engineers , 2011 .

[3]  Li Fu,et al.  A novel fuzzy deep-learning approach to traffic flow prediction with uncertain spatial–temporal data features , 2018, Future Generation Computer Systems.

[4]  Keqin Li,et al.  Efficient task scheduling for budget constrained parallel applications on heterogeneous cloud computing systems , 2017, Future Gener. Comput. Syst..

[5]  Keqin Li,et al.  Energy management for multiple real-time workflows on cyber-physical cloud systems , 2017, Future Gener. Comput. Syst..

[6]  Jiannong Cao,et al.  On-Line Anomaly Detection With High Accuracy , 2018, IEEE/ACM Transactions on Networking.

[7]  Jun-Geol Baek,et al.  Density weighted support vector data description , 2014, Expert Syst. Appl..

[8]  Jiannong Cao,et al.  Accurate Recovery of Internet Traffic Data Under Variable Rate Measurements , 2018, IEEE/ACM Transactions on Networking.

[9]  Guilherme Campos Unsupervised Ensembles for Outlier Detection , 2018, PhD@VLDB.

[10]  P. Tseng,et al.  On the convergence of the coordinate descent method for convex differentiable minimization , 1992 .

[11]  Haiquan Zhao,et al.  Distributed Online One-Class Support Vector Machine for Anomaly Detection Over Networks , 2019, IEEE Transactions on Cybernetics.

[12]  Jian Yang,et al.  A weighted one-class support vector machine , 2016, Neurocomputing.

[13]  Arthur Zimek,et al.  Data perturbation for outlier detection ensembles , 2014, SSDBM '14.

[14]  Charu C. Aggarwal,et al.  Theoretical Foundations and Algorithms for Outlier Ensembles , 2015, SKDD.

[15]  Anders Krogh,et al.  Neural Network Ensembles, Cross Validation, and Active Learning , 1994, NIPS.

[16]  Vivekanand Gopalkrishnan,et al.  Mining Outliers with Ensemble of Heterogeneous Detectors on Random Subspaces , 2010, DASFAA.

[17]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[18]  Xindong Wu,et al.  Detecting and Assessing Anomalous Evolutionary Behaviors of Nodes in Evolving Social Networks , 2019, ACM Trans. Knowl. Discov. Data.

[19]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[20]  Djamel Djenouri,et al.  A Survey on Urban Traffic Anomalies Detection Algorithms , 2019, IEEE Access.

[21]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[22]  Arthur Zimek,et al.  Ensembles for unsupervised outlier detection: challenges and research questions a position paper , 2014, SKDD.

[23]  Arthur Zimek,et al.  Subsampling for efficient and effective unsupervised outlier detection ensembles , 2013, KDD.

[24]  Shiming He,et al.  An efficient privacy-preserving compressive data gathering scheme in WSNs , 2015, Inf. Sci..

[25]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[26]  Gavin Brown An Information Theoretic Perspective on Multiple Classifier Systems , 2009, MCS.

[27]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[28]  Ming Shao,et al.  Multi-View Low-Rank Analysis with Applications to Outlier Detection , 2018, ACM Trans. Knowl. Discov. Data.

[29]  R. Adapa,et al.  A review of selected optimal power flow literature to 1993. II. Newton, linear programming and interior point methods , 1999 .

[30]  Hans-Peter Kriegel,et al.  On Evaluation of Outlier Rankings and Outlier Scores , 2012, SDM.

[31]  Jiannong Cao,et al.  Accurate Recovery of Internet Traffic Data: A Sequential Tensor Completion Approach , 2018, IEEE/ACM Transactions on Networking.

[32]  Charu C. Aggarwal,et al.  Outlier ensembles: position paper , 2013, SKDD.

[33]  Nan Yang,et al.  A disease diagnosis and treatment recommendation system based on big data mining and cloud computing , 2018, Inf. Sci..

[34]  Wei Hu,et al.  Raw Wind Data Preprocessing: A Data-Mining Approach , 2015, IEEE Transactions on Sustainable Energy.

[35]  Leman Akoglu,et al.  Sequential Ensemble Learning for Outlier Detection: A Bias-Variance Perspective , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[36]  Jing Gao,et al.  Converting Output Scores from Outlier Detection Algorithms into Probability Estimates , 2006, Sixth International Conference on Data Mining (ICDM'06).

[37]  Jiannong Cao,et al.  Fast Tensor Factorization for Accurate Internet Anomaly Detection , 2017, IEEE/ACM Transactions on Networking.

[38]  Huangang Wang,et al.  Ramp Loss based robust one-class SVM , 2017, Pattern Recognit. Lett..

[39]  Jennifer Neville,et al.  Designing Size Consistent Statistics for Accurate Anomaly Detection in Dynamic Networks , 2018, ACM Trans. Knowl. Discov. Data.