Integrating community matching and outlier detection for mining evolutionary community outliers

Temporal datasets, in which data evolves continuously, exist in a wide variety of applications, and identifying anomalous or outlying objects from temporal datasets is an important and challenging task. Different from traditional outlier detection, which detects objects that have quite different behavior compared with the other objects, temporal outlier detection tries to identify objects that have different evolutionary behavior compared with other objects. Usually objects form multiple communities, and most of the objects belonging to the same community follow similar patterns of evolution. However, there are some objects which evolve in a very different way relative to other community members, and we define such objects as evolutionary community outliers. This definition represents a novel type of outliers considering both temporal dimension and community patterns. We investigate the problem of identifying evolutionary community outliers given the discovered communities from two snapshots of an evolving dataset. To tackle the challenges of community evolution and outlier detection, we propose an integrated optimization framework which conducts outlier-aware community matching across snapshots and identification of evolutionary outliers in a tightly coupled way. A coordinate descent algorithm is proposed to improve community matching and outlier detection performance iteratively. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting evolutionary community outliers.

[1]  Yvan Saeys,et al.  Java-ML: A Machine Learning Library , 2009, J. Mach. Learn. Res..

[2]  Philip S. Yu,et al.  Combining multiple clusterings by soft correspondence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[3]  Philip S. Yu,et al.  Outlier detection in graph streams , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[4]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[5]  Yizhou Sun,et al.  On community outliers and their efficient detection in information networks , 2010, KDD.

[6]  Sridhar Ramaswamy,et al.  Efficient algorithms for mining outliers from large data sets , 2000, SIGMOD '00.

[7]  William W. Cohen,et al.  Learning to match and cluster large high-dimensional data sets for data integration , 2002, KDD.

[8]  Eleazar Eskin,et al.  Anomaly Detection over Noisy Data using Learned Probability Distributions , 2000, ICML.

[9]  Marcia Goldfarb Computer analysis of two-dimensional gels. , 2007, Journal of biomolecular techniques : JBT.

[10]  Philip S. Yu,et al.  Outlier Detection with Uncertain Data , 2008, SDM.

[11]  ShimKyuseok,et al.  Efficient algorithms for mining outliers from large data sets , 2000 .

[12]  Hui Xiong,et al.  Top-Eye: top-k evolving trajectory outlier detection , 2010, CIKM.

[13]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[14]  M. Otto,et al.  Outliers in Time Series , 1972 .

[15]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[16]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[17]  Vladimir Pavlovic,et al.  Discovering clusters in motion time-series data , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[18]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[19]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[20]  N. Xuong,et al.  Computer analysis of two-dimensional gels. , 1981, Analytical biochemistry.

[21]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[22]  Kurt Hornik,et al.  Voting-Merging: An Ensemble Method for Clustering , 2001, ICANN.

[23]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[24]  Srinivasan Parthasarathy,et al.  LOADED: link-based outlier and anomaly detection in evolving data sets , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[25]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[26]  Mark J. Miller,et al.  Computer analysis of two‐dimensional gels: Automatic matching , 1984 .

[27]  Yizhou Sun,et al.  Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models , 2009, NIPS.

[28]  Noah A. Rosenberg,et al.  CLUMPP: a cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure , 2007, Bioinform..

[29]  Wenjie Hu,et al.  Robust Anomaly Detection Using Support Vector Machines , 2003 .

[30]  Ying Sun,et al.  Motion Estimation Via Cluster Matching , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Hans-Peter Kriegel,et al.  Interpreting and Unifying Outlier Scores , 2011, SDM.

[32]  Hans-Peter Kriegel,et al.  LoOP: local outlier probabilities , 2009, CIKM.

[33]  Aleksandar Lazarevic,et al.  Incremental Local Outlier Detection for Data Streams , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.