An Experimental Analysis of Fraud Detection Methods in Enterprise Telecommunication Data using Unsupervised Outlier Ensembles

This work uses outlier ensembles to detect fraudulent calls in telephone communication logs made on the network of POST Luxembourg. Outlier detection on high-dimensional data is challenging and developing an approach which is robust enough is of paramount importance to automatically identify unexpected events. For use in real-world business applications it is important to obtain a robust detection method, i.e. a method that can perform well on different types of data, to ensure that the method will not impact that business in unexpected ways. Many factors affect the robustness of an outlier detection approach and this experimental analysis exposes these factors in the context of outlier ensembles using feature bagging. Real-world problems demand knowledge about possible candidate approaches that address the problem, and decide for the best performing method using a train-test split of labeled data. In the unsupervised setup the knowledge about performance is missing during the learning phase thus is difficult to decide during that phase. Hence, in this setup it is important to know about how the performance is affected before the learning phase. Hence, this analysis demonstrates that despite the collective power of outlier ensembles they are still affected by i) data normalization schemes, ii) combination functions iii) outlier detection algorithms.

[1]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[2]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[3]  Hans-Peter Kriegel,et al.  Generalized Outlier Detection with Flexible Kernel Density Estimates , 2014, SDM.

[4]  Seiichi Uchida,et al.  A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data , 2016, PloS one.

[5]  Arthur Zimek,et al.  Ensembles for unsupervised outlier detection: challenges and research questions a position paper , 2014, SKDD.

[6]  Klemens Böhm,et al.  HiCS: High Contrast Subspaces for Density-Based Outlier Ranking , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[7]  Charu C. Aggarwal,et al.  Subspace histograms for outlier detection in linear time , 2018, Knowledge and Information Systems.

[8]  Charu C. Aggarwal,et al.  Outlier Ensembles - An Introduction , 2017 .

[9]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[10]  Back Thomas,et al.  Local subspace-based outlier detection using global neighbourhoods , 2016 .

[11]  Zhi-Hua Zhou,et al.  Isolation Forest , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  Arthur Zimek,et al.  On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study , 2016, Data Mining and Knowledge Discovery.

[13]  Klemens Böhm,et al.  Outlier Ranking via Subspace Analysis in Multiple Views of the Data , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Arthur Zimek,et al.  Subsampling for efficient and effective unsupervised outlier detection ensembles , 2013, KDD.

[15]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[16]  Hans-Peter Kriegel,et al.  On Evaluation of Outlier Rankings and Outlier Scores , 2012, SDM.

[17]  Yue Zhao,et al.  XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[18]  Ira Assent,et al.  Learning Representations for Outlier Detection on a Budget , 2015, ArXiv.

[19]  Sylvie Ratté,et al.  Bagged Subspaces for Unsupervised Outlier Detection , 2017, Comput. Intell..