Graph-based approach for outlier detection in sequential data and its application on stock market and weather data

Outlier detection has a large variety of applications ranging from detecting intrusion in a computer network, to forecasting hurricanes and tornados in weather data, to identifying indicators of potential crisis in stock market data, etc. The problem of finding outliers in sequential data has been widely studied in the data mining literature and many techniques have been developed to tackle the problem in various application domains. However, many of these techniques rely on the peculiar characteristics of a specific type of data to detect the outliers. As a result, they cannot be easily applied to different types of data in other application domains; they should at least be tuned and customized to adapt to the new domain. They also may need certain amount of training data to build their models. This makes them hard to apply especially when only a limited amount of data is available. The work described in this paper tackle the problem by proposing a graph-based approach for the discovery of contextual outliers in sequential data. The developed algorithm offers a higher degree of flexibility and requires less amount of information about the nature of the analyzed data compared to the previous approaches described in the literature. In order to validate our approach, we conducted experiments on stock market and weather data; we compared the results with the results from our previous work. Our analysis of the results demonstrate that the algorithm proposed in this paper is successful and effective in detecting outliers in data from different domains, one financial and the other meteorological.

[1]  Ke Zhang,et al.  An Effective Pattern Based Outlier Detection Approach for Mixed Attribute Data , 2010, Australasian Conference on Artificial Intelligence.

[2]  Dwl Cheung,et al.  Parallel Algorithm for Mining Outliers in Large Database , 1999 .

[3]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[4]  Christopher Krügel,et al.  Bayesian event classification for intrusion detection , 2003, 19th Annual Computer Security Applications Conference, 2003. Proceedings..

[5]  Philip S. Yu,et al.  Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[6]  Shashi Shekhar,et al.  A Unified Approach to Detecting Spatial Outliers , 2003, GeoInformatica.

[7]  Sanjay Chawla,et al.  On local spatial outliers , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[8]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[9]  Nagiza F. Samatova,et al.  Community-based anomaly detection in evolutionary networks , 2012, Journal of Intelligent Information Systems.

[10]  Reda Alhajj,et al.  A parallel multi-scale region outlier mining algorithm for meteorological data , 2007, GIS.

[11]  Chang-Tien Lu,et al.  Detecting and tracking regional outliers in meteorological data , 2007, Inf. Sci..

[12]  Shashi Shekhar,et al.  Spatial Databases - Accomplishments and Research Needs , 1999, IEEE Trans. Knowl. Data Eng..

[13]  Rüdiger W. Brause,et al.  Neural data mining for credit card fraud detection , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[14]  Reda Alhajj,et al.  Fourier Transform Based Spatial Outlier Mining , 2009, IDEAL.

[15]  Aidong Zhang,et al.  FindOut: Finding Outliers in Very Large Datasets , 2002, Knowledge and Information Systems.

[16]  Christopher Krügel,et al.  Service specific anomaly detection for network intrusion detection , 2002, SAC '02.

[17]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[18]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[19]  Li Wei,et al.  HOT: Hypergraph-Based Outlier Test for Categorical Data , 2003, PAKDD.

[20]  T. Karthikeyan,et al.  Outlier Removal Clustering through Minimum Spanning Tree , 2011 .

[21]  Reda Alhajj,et al.  Semi-Supervised Dynamic Classification for Intrusion Detection , 2010, Int. J. Softw. Eng. Knowl. Eng..

[22]  Philip Chan,et al.  Learning States and Rules for Detecting Anomalies in Time Series , 2005, Applied Intelligence.

[23]  Rajeev Rastogi,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD 2000.

[24]  Zengyou He,et al.  Discovering cluster-based local outliers , 2003, Pattern Recognit. Lett..

[25]  Chang-Tien Lu,et al.  Detecting region outliers in meteorological data , 2003, GIS '03.

[26]  Shashi Shekhar,et al.  Detecting graph-based spatial outliers: algorithms and applications (a summary of results) , 2001, KDD '01.

[27]  Raymond T. Ng,et al.  A Unified Notion of Outliers: Properties and Computation , 1997, KDD.

[28]  Arindam Banerjee,et al.  Anomaly detection using manifold embedding and its applications in transportation corridors , 2009, Intell. Data Anal..

[29]  Derya Birant,et al.  Spatio-temporal outlier detection in large databases , 2006, 28th International Conference on Information Technology Interfaces, 2006..

[30]  Martin Meckesheimer,et al.  Automatic outlier detection for time series: an application to sensor data , 2007, Knowledge and Information Systems.

[31]  Nirvana Meratnia,et al.  Outlier Detection Techniques for Wireless Sensor Networks: A Survey , 2008, IEEE Communications Surveys & Tutorials.

[32]  Takehisa Yairi,et al.  An approach to spacecraft anomaly detection problem using kernel feature space , 2005, KDD '05.

[33]  Zhilin Li,et al.  A Multiscale Approach for Spatio‐Temporal Outlier Detection , 2006, Trans. GIS.

[34]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.