Anomaly Detection from Incomplete Data

Anomaly detection (a.k.a., outlier or burst detection) is a well-motivated problem and a major data mining and knowledge discovery task. In this article, we study the problem of population anomaly detection, one of the key issues related to event monitoring and population management within a city. Through studying detected population anomalies, we can trace and analyze these anomalies, which could help to model city traffic design and event impact analysis and prediction. Although a significant and interesting issue, it is very hard to detect population anomalies and retrieve anomaly trajectories, especially given that it is difficult to get actual and sufficient population data. To address the difficulties of a lack of real population data, we take advantage of mobile phone networks, which offer enormous spatial and temporal communication data on persons. More importantly, we claim that we can utilize these mobile phone data to infer and approximate population data. Thus, we can study the population anomaly detection problem by taking advantages of unique features hidden in mobile phone data. In this article, we present a system to conduct Population Anomaly Detection (PAD). First, we propose an effective clustering method, correlation-based clustering, to cluster the incomplete location information from mobile phone data (i.e., from mobile call volume distribution to population density distribution). Then, we design an adaptive parameter-free detection method, R-scan, to capture the distributed dynamic anomalies. Finally, we devise an efficient algorithm, BT-miner, to retrieve anomaly trajectories. The experimental results from real-life mobile phone data confirm the effectiveness and efficiency of the proposed algorithms. Finally, the proposed methods are realized as a pilot system in a city in China.

[1]  Richard Sproat,et al.  Mining correlated bursty topic patterns from coordinated text streams , 2007, KDD '07.

[2]  Carlo Ratti,et al.  Estimating Origin-Destination flows using opportunistically collected mobile phone location data from one million users in Boston Metropolitan Area , 2011 .

[3]  Yannick Assogba,et al.  Detecting outlier sections in us congressional legislation , 2011, SIGIR.

[4]  Siyuan Liu,et al.  Towards mobility-based clustering , 2010, KDD.

[5]  Marcel Karnstedt,et al.  Adaptive burst detection in a stream engine , 2009, SAC '09.

[6]  Lisa Singh,et al.  Privately detecting bursts in streaming, distributed time series data , 2009, Data Knowl. Eng..

[7]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[8]  Tom Fawcett,et al.  Activity monitoring: noticing interesting changes in behavior , 1999, KDD '99.

[9]  Kenji Yamanishi,et al.  A unifying framework for detecting outliers and change points from non-stationary time series data , 2002, KDD.

[10]  Gian Luca Foresti,et al.  On-line trajectory clustering for anomalous events detection , 2006, Pattern Recognit. Lett..

[11]  Christian Böhm,et al.  Outlier-robust clustering using independent components , 2008, SIGMOD Conference.

[12]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[13]  Carlo Ratti,et al.  Real-Time Urban Monitoring Using Cell Phones: A Case Study in Rome , 2011, IEEE Transactions on Intelligent Transportation Systems.

[14]  Jiawei Han,et al.  ACM Transactions on Knowledge Discovery from Data: Introduction , 2007 .

[15]  Ee-Peng Lim,et al.  Analyzing feature trajectories for event detection , 2007, SIGIR.

[16]  Yifan Li,et al.  Clustering moving objects , 2004, KDD.

[17]  Padhraic Smyth,et al.  Trajectory clustering with mixtures of regression models , 1999, KDD '99.

[18]  Hui Xiong,et al.  A Taxi Driving Fraud Detection System , 2011, 2011 IEEE 11th International Conference on Data Mining.

[19]  Zhiguo Gong,et al.  Identifying points of interest by self-tuning clustering , 2011, SIGIR.

[20]  Jerome L. Myers,et al.  Research Design and Statistical Analysis , 1991 .

[21]  Jae-Gil Lee,et al.  Trajectory Outlier Detection: A Partition-and-Detect Framework , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[22]  David Wai-Lok Cheung,et al.  Clustering Uncertain Data Using Voronoi Diagrams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Xin Zhang,et al.  Better Burst Detection , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[24]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[25]  Sanjay Chawla,et al.  Inferring the Root Cause in Road Traffic Anomalies , 2012, 2012 IEEE 12th International Conference on Data Mining.

[26]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[27]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[28]  Hsinchun Chen,et al.  Burst Detection From Multiple Data Streams: A Network-Based Approach , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[30]  Xing Xie,et al.  Discovering spatio-temporal causal interactions in traffic data streams , 2011, KDD.

[31]  Dawei Liu,et al.  Efficient anomaly monitoring over moving object trajectory streams , 2009, KDD.

[32]  Philip S. Yu,et al.  Correlating burst events on streaming stock market data , 2007, Data Mining and Knowledge Discovery.

[33]  Anna Monreale,et al.  WhereNext: a location predictor on trajectory pattern mining , 2009, KDD.

[34]  Jae-Gil Lee,et al.  TraClass: trajectory classification using hierarchical region-based and trajectory-based clustering , 2008, Proc. VLDB Endow..

[35]  Carlo Ratti,et al.  Cellular Census: Explorations in Urban Data Collection , 2007, IEEE Pervasive Computing.

[36]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[37]  Sanjay Chawla,et al.  On detection of emerging anomalous traffic patterns using GPS data , 2013, Data Knowl. Eng..

[38]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[39]  Jae-Gil Lee,et al.  Trajectory clustering: a partition-and-group framework , 2007, SIGMOD '07.

[40]  Jaideep Srivastava,et al.  Event detection from time series data , 1999, KDD '99.

[41]  Nish Parikh,et al.  Scalable and near real-time burst detection from eCommerce queries , 2008, KDD.

[42]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[43]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[44]  Kenneth Wai-Ting Leung,et al.  CLR: a collaborative location recommendation framework based on co-clustering , 2011, SIGIR.