Online Outlier Detection Based on Relative Neighbourhood Dissimilarity

Outlier detection has many practical applications, especially in domains that have scope for abnormal behavior, such as fraud detection, network intrusion detection, medical diagnosis, etc. In this paper, we present a technique for detecting outliers and learning from data in multi-dimensional streams. Since the concept in such streaming data may drift, learning approaches should be online and should adapt quickly. Our technique adapts to new incoming data points, and incrementally maintains the models it builds in order to overcome the effect of concept drift. Through various experimental results on real data sets, our approach is shown to be effective in detecting outliers in data streams as well as in maintaining model accuracy.

[1]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Vipin Kumar,et al.  Feature bagging for outlier detection , 2005, KDD '05.

[4]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[5]  Fazli Can,et al.  Incremental clustering for dynamic information processing , 1993, TOIS.

[6]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[7]  Dimitrios Gunopulos,et al.  Online outlier detection in sensor data using non-parametric models , 2006, VLDB.

[8]  Charu C. Aggarwal,et al.  On Abnormality Detection in Spuriously Populated Data Streams , 2005, SDM.

[9]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[10]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[11]  Clara Pizzuti,et al.  Distance-based detection and prediction of outliers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[12]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[13]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Srinivasan Parthasarathy,et al.  Fast Distributed Outlier Detection in Mixed-Attribute Data Sets , 2006, Data Mining and Knowledge Discovery.

[15]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.