Spatio-temporal outlier detection algorithms based on computing behavioral outlierness factor

Abstract A major task in spatio-temporal outlier detection is to identify objects that exhibit abnormal behavior either spatially, and/or temporally. There have only been a few algorithms proposed for detecting spatial and/or temporal outliers. One example is the Local Density-Based Spatial Clustering of Applications with Noise (LDBSCAN). Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is mainly for clustering; it just tells us whether an object belongs to a cluster or it is an outlier. A measure known as Local Outlier Factor (LOF) gives a quantitative measure of outlierness to each object, where a high LOF score means it is potentially an outlier. LDBSCAN algorithm, which combines the above notions, considers only the spatial context. Furthermore, the notion of a cluster is defeated (i.e. LDBSCAN may report clusters having less than the minimum required points in a cluster), and some of the outliers may not be detected because of the limitation of the existing conditions in the LDBSCAN algorithm. In this paper, we propose two algorithms, namely Spatio-Temporal Behavioral Density-based Clustering of Applications with Noise (ST-BDBCAN) and Approx-ST-BDBCAN. ST-BDBCAN algorithm adopts the proposed, new concept, called Spatio-Temporal Behavioral Outlier Factor (ST-BOF), which is a spatio-temporal extension to LOF. It also uses both spatial and temporal attributes simultaneously to define the context. By doing so, the relative importance of spatial continuity or temporal continuity appropriate to the application at hand can be established. The Approx-ST-BDBCAN algorithm achieves improved scalability, with minimal loss of detection accuracy by partitioning data points for parallel processing. Experimental results on synthetic, and buoy datasets suggest that our proposed algorithms are accurate and computationally efficient. Additionally, new Outlier Association with Hurricane Intensity Index (OAHII) measures are introduced for quantitative evaluation of the results from buoy dataset.

[1]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[2]  David R. Kaeli,et al.  Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems , 2010, GPGPU-3.

[3]  Mohammad Zulkernine,et al.  Anomaly Based Network Intrusion Detection with Unsupervised Outlier Detection , 2006, 2006 IEEE International Conference on Communications.

[4]  Niall M. Adams,et al.  Fault Mining Using Peer Group Analysis , 2010, GfKl.

[5]  J. Zhan,et al.  A Novel Outlier Detection Scheme for Network Intrusion Detection Systems , 2008, 2008 International Conference on Information Security and Assurance (isa 2008).

[6]  Wei-keng Liao,et al.  A new scalable parallel DBSCAN algorithm using the disjoint-set data structure , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[7]  Qi Liu,et al.  Unsupervised detection of contextual anomaly in remotely sensed data , 2017 .

[8]  Zhilin Li,et al.  A Multiscale Approach for Spatio‐Temporal Outlier Detection , 2006, Trans. GIS.

[9]  D. Hand,et al.  Unsupervised Profiling Methods for Fraud Detection , 2002 .

[10]  John F. Roddick,et al.  A bibliography of temporal, spatial and spatio-temporal data mining research , 1999, SKDD.

[11]  Yong Hu,et al.  The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature , 2011, Decis. Support Syst..

[12]  Abraham Kandel,et al.  Scalable fuzzy neighborhood DBSCAN , 2010, International Conference on Fuzzy Systems.

[13]  Carlotta Domeniconi,et al.  Detecting spatio-temporal outliers with kernels and statistical testing , 2009, 2009 17th International Conference on Geoinformatics.

[14]  Haibo He,et al.  A local density-based approach for outlier detection , 2017, Neurocomputing.

[15]  Ling Tian,et al.  A Parallel DBSCAN Algorithm Based on Spark , 2016, 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom).

[16]  Derya Birant,et al.  Spatio-temporal outlier detection in large databases , 2006, 28th International Conference on Information Technology Interfaces, 2006..

[17]  Christopher Leckie,et al.  Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters , 2005, ACSC.

[18]  Gabriella Schoier,et al.  A methodology for dealing with spatial big data , 2017, Int. J. Bus. Intell. Data Min..

[19]  Wei-keng Liao,et al.  A Novel Scalable DBSCAN Algorithm with Spark , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[20]  Di Ma,et al.  MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[21]  Howard J. Hamilton,et al.  DBRS: A Density-Based Spatial Clustering Method with Random Sampling , 2003, PAKDD.

[22]  Christos Faloutsos,et al.  LOCI: fast outlier detection using the local correlation integral , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[23]  Hans-Peter Kriegel,et al.  A Fast Parallel Clustering Algorithm for Large Spatial Databases , 1999, Data Mining and Knowledge Discovery.

[24]  Lida Xu,et al.  A local-density based spatial clustering algorithm with noise , 2007, Inf. Syst..

[25]  David J. Hand,et al.  Statistical fraud detection: A review , 2002 .

[26]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[27]  Jill F. Hasling Freeman/Hasling Hurricane Damage Potential Scale , 2011 .

[28]  Charu C. Aggarwal,et al.  An Introduction to Outlier Analysis , 2013 .

[29]  Khaled Mahar,et al.  Using grid for accelerating density-based clustering , 2008, 2008 8th IEEE International Conference on Computer and Information Technology.

[30]  Sanjay Chawla,et al.  Spatio-temporal Outlier Detection in Precipitation Data , 2008, KDD Workshop on Knowledge Discovery from Sensor Data.

[31]  Ke Zhang,et al.  A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data , 2009, PAKDD.

[32]  Barton P. Miller,et al.  Mr. Scan: Extreme scale density-based clustering using a tree-based network of GPGPU nodes , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[33]  Jaideep Srivastava,et al.  A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection , 2003, SDM.

[34]  Jian Tang,et al.  Enhancing Effectiveness of Outlier Detections for Low Density Patterns , 2002, PAKDD.

[35]  Xiao Wang,et al.  An Efficient Density-based Clustering Algorithm Combined with Representative Set ⋆ , 2013 .

[36]  Akira Maeda,et al.  Unsupervised Outlier Detection in Time Series Data , 2006, 22nd International Conference on Data Engineering Workshops (ICDEW'06).

[37]  Leland McInnes,et al.  hdbscan: Hierarchical density based clustering , 2017, J. Open Source Softw..

[38]  Shuchita Upadhyaya,et al.  Outlier Detection: Applications And Techniques , 2012 .

[39]  Hans-Peter Kriegel,et al.  A distribution-based clustering algorithm for mining in large spatial databases , 1998, Proceedings 14th International Conference on Data Engineering.

[40]  Kee Siong Ng,et al.  Detecting Non-compliant Consumers in Spatio-Temporal Health Data: A Case Study from Medicare Australia , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[41]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[42]  Zhengxin Chen,et al.  Application of Clustering Methods to Health Insurance Fraud Detection , 2006, 2006 International Conference on Service Systems and Service Management.

[43]  W. Tobler A Computer Movie Simulating Urban Growth in the Detroit Region , 1970 .