Integration of deep feature extraction and ensemble learning for outlier detection

Abstract It is obvious to see that most of the datasets do not have exactly equal number of samples for each class. However, there are some tasks like detection of fraudulent transactions, for which class imbalance is overwhelming and one of the classes has very low (even less than 10% of the entire data) amount of samples. These tasks often fall under outlier detection. Moreover, there are some scenarios where there may be multiple subsets of the outlier class. In such cases, it should be treated as a multiple outlier type detection scenario. In this article, we have proposed a system that can efficiently handle all the aforementioned problems. We have used stacked autoencoders to extract features and then used an ensemble of probabilistic neural networks to do a majority voting and detect the outliers. Such a system is seen to have a better and reliable performance as compared to the other outlier detection systems in most of the datasets tested upon. It is seen that use of autoencoders clearly enhanced the outlier detection performance.

[1]  Simon J. Doran,et al.  Stacked Autoencoders for Unsupervised Feature Learning and Multiple Organ Detection in a Pilot Study Using 4D Patient Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Carlo Gatta,et al.  Unsupervised Deep Feature Extraction for Remote Sensing Image Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[3]  Christopher Leckie,et al.  High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning , 2016, Pattern Recognit..

[4]  Samia Boukir,et al.  Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin , 2015 .

[5]  Hongxun Yao,et al.  Auto-encoder based dimensionality reduction , 2016, Neurocomputing.

[6]  Adriano Lorena Inácio de Oliveira,et al.  Novelty detection with constructive probabilistic neural networks , 2008, Neurocomputing.

[7]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  D. Altman,et al.  Multiple significance tests: the Bonferroni method , 1995, BMJ.

[9]  Jing Lin,et al.  Adaptive kernel density-based anomaly detection for nonlinear systems , 2018, Knowl. Based Syst..

[10]  Sankar K. Pal,et al.  Self-organization for object extraction using a multilayer neural network and fuzziness measures , 1993, IEEE Trans. Fuzzy Syst..

[11]  Qi Liu,et al.  Unsupervised detection of contextual anomaly in remotely sensed data , 2017 .

[12]  Jonathan M. Nichols,et al.  Manifold learning techniques for unsupervised anomaly detection , 2018, Expert Syst. Appl..

[13]  J kurianM,et al.  IMPROVING THE PERFORMANCE OF A CLASSIFICATION BASED OUTLIER DETECTION SYSTEM USING DIMENSIONALITY REDUCTION TECHNIQUES , 2017 .

[14]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[15]  Alexandros Nanopoulos,et al.  Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection , 2015, IEEE Transactions on Knowledge and Data Engineering.

[16]  Adriano L. I. Oliveira,et al.  Letters: Novelty detection with constructive probabilistic neural networks , 2008 .

[17]  S. E. Khadem,et al.  Improving one class support vector machine novelty detection scheme using nonlinear features , 2018, Pattern Recognit..

[18]  Fei Tony Liu,et al.  Isolation-Based Anomaly Detection , 2012, TKDD.

[19]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[20]  Yumin Chen,et al.  Neighborhood outlier detection , 2010, Expert Syst. Appl..

[21]  Ira Assent,et al.  Learning Outlier Ensembles: The Best of Both Worlds - Supervised and Unsupervised , 2014 .

[22]  Quan Pan,et al.  A new belief-based K-nearest neighbor classification method , 2013, Pattern Recognit..

[23]  D. Wilkes,et al.  A fast MST-inspired kNN-based outlier detection method , 2015, Inf. Syst..

[24]  Reid A. Johnson,et al.  Calibrating Probability with Undersampling for Unbalanced Classification , 2015, 2015 IEEE Symposium Series on Computational Intelligence.

[25]  Maurizio Filippone,et al.  A comparative evaluation of outlier detection algorithms: Experiments and analyses , 2018, Pattern Recognit..

[26]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[27]  Gayatri Attarde,et al.  Outlier Detection Using Unsupervised and Semi-Supervised Technique on High Dimensional Data , 2016 .

[28]  Douglas M. Hawkins Identification of Outliers , 1980, Monographs on Applied Probability and Statistics.

[29]  Joshua D. Knowles,et al.  Fifty years of pulsar candidate selection: from simple filters to a new principled real-time classification approach , 2016, Monthly Notices of the Royal Astronomical Society.

[30]  Charu C. Aggarwal,et al.  Theoretical Foundations and Algorithms for Outlier Ensembles , 2015, SKDD.

[31]  Gautam Bhattacharya,et al.  kNN Classification with an Outlier Informative Distance Measure , 2017, PReMI.

[32]  Charu C. Aggarwal,et al.  LODES: Local Density Meets Spectral Outlier Detection , 2016, SDM.

[33]  Amir Hussain,et al.  Comparing Oversampling Techniques to Handle the Class Imbalance Problem: A Customer Churn Prediction Case Study , 2016, IEEE Access.

[34]  Caroline Petitjean,et al.  One class random forests , 2013, Pattern Recognit..

[35]  Johan A. K. Suykens,et al.  Multi-Class Supervised Novelty Detection , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Kai Ming Ting,et al.  Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[37]  Patrick R. Nicolas Scala for machine learning : data processing, ML algorithms, smart analytics, and more , 2017 .

[38]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.