Voting-Based Ensemble of Unsupervised Outlier Detectors

Datasets may contain small sets of data objects whose characteristics are not in accordance with the mainstream characteristics of the data objects in a dataset. These data objects, which are not noise, may contain valuable information and are called outliers. Outlier detection is a topic of research in many fields like detecting malwares in cyber security, finding fake financial transactions, identifying defects in industrial products, detecting abnormality in health data, etc. Researchers have developed several application methods for detecting outliers and a few generic methods. These methods can be grouped into unsupervised methods, supervised methods and semi-supervised methods based on the readiness of class labels. We, in this paper, present the performance of three outlier detection algorithms using the realworld datasets. The algorithms used are one-class SVM, elliptic envelope and local outlier factor. In order to improve the performance, all these algorithms were selected and ensemble based on voting mechanism. The influence of dimensionality reduction on the proposed ensemble method has also been studied. Experiments using publicly available datasets show that the proposed technique outperforms individual outlier detectors.

[1]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[2]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[3]  Rajesh Kumar,et al.  Continuous authentication using one-class classifiers and their fusion , 2017, 2018 IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA).

[4]  Francesca Bovolo,et al.  Semisupervised One-Class Support Vector Machines for Classification of Remote Sensing Data , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[5]  Jung-Min Park,et al.  An overview of anomaly detection techniques: Existing solutions and latest technological trends , 2007, Comput. Networks.

[6]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[7]  Søren Hauberg,et al.  Scalable Robust Principal Component Analysis Using Grassmann Averages , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Ruggero G. Pensa,et al.  A Semisupervised Approach to the Detection and Characterization of Outliers in Categorical Data , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[10]  Felix Naumann,et al.  Data fusion , 2009, CSUR.