UAPD: Predicting Urban Anomalies from Spatial-Temporal Data

Urban city environments face the challenge of disturbances, which can create inconveniences for its citizens. These require timely detection and resolution, and more importantly timely preparedness on the part of city officials. We term these disturbances as anomalies, and pose the problem statement: if it is possible to also predict these anomalous events (proactive), and not just detect (reactive). While significant effort has been made in detecting anomalies in existing urban data, the prediction of future urban anomalies is much less well studied and understood. In this work, we formalize the future anomaly prediction problem in urban environments, such that those can be addressed in a more efficient and effective manner. We develop the Urban Anomaly PreDiction (UAPD) framework, which addresses a number of challenges, including the dynamic, spatial varieties of different categories of anomalies. Given the urban anomaly data to date, UAPD first detects the change point of each type of anomalies in the temporal dimension and then uses a tensor decomposition model to decouple the interrelations between the spatial and categorical dimensions. Finally, UAPD applies an autoregression method to predict which categories of anomalies will happen at each region in the future. We conduct extensive experiments in two urban environments, namely New York City and Pittsburgh. Experimental results demonstrate that UAPD outperforms alternative baselines across various settings, including different region and time-frame scales, as well as diverse categories of anomalies. Code related to this chapter is available at: https://bitbucket.org/xianwu9/uapd.

[1]  Mahsa Salehi,et al.  Profiling Pedestrian Distribution and Anomaly Detection in a Dynamic Environment , 2015, CIKM.

[2]  Sumio Watanabe,et al.  A widely applicable Bayesian information criterion , 2012, J. Mach. Learn. Res..

[3]  Hui Xiong,et al.  Sparse Bayesian Content-Aware Collaborative Filtering for Implicit Feedback , 2016, IJCAI.

[4]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[5]  Aki Vehtari,et al.  Understanding predictive information criteria for Bayesian models , 2013, Statistics and Computing.

[6]  Fei-Yue Wang,et al.  Traffic Flow Prediction With Big Data: A Deep Learning Approach , 2015, IEEE Transactions on Intelligent Transportation Systems.

[7]  Arun Kejariwal,et al.  A Novel Technique for Long-Term Anomaly Detection in the Cloud , 2014, HotCloud.

[8]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[9]  Yu Zheng,et al.  Detecting collective anomalies from multiple spatio-temporal datasets across different domains , 2015, SIGSPATIAL/GIS.

[10]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[11]  Cyrus Shahabi,et al.  Crowd sensing of traffic anomalies based on human mobility and social media , 2013, SIGSPATIAL/GIS.

[12]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[13]  Paul J. M. Havinga,et al.  FLEAD: online frequency likelihood estimation anomaly detection for mobile sensing , 2013, UbiComp.

[14]  J. Kruskal,et al.  Candelinc: A general approach to multidimensional analysis of many-way arrays with linear constraints on parameters , 1980 .

[15]  Danai Koutra,et al.  Fast anomaly detection despite the duplicates , 2013, WWW.

[16]  Shou-De Lin,et al.  Inferring Air Quality for Station Location Recommendation Based on Urban Big Data , 2015, KDD.

[17]  James D. Hamilton Time Series Analysis , 1994 .

[18]  Rasmus Bro,et al.  Recent developments in CANDECOMP/PARAFAC algorithms: a critical review , 2003 .

[19]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[20]  Ami Wiesel,et al.  Time Varying Autoregressive Moving Average Models for Covariance Estimation , 2013, IEEE Transactions on Signal Processing.

[21]  Yun Wang,et al.  On efficiency properties of an R-square coefficient based on final prediction error , 2013 .

[22]  Heeyoung Kim,et al.  Bayesian Nonparametric Collaborative Topic Poisson Factorization for Electronic Health Records-Based Phenotyping , 2016, IJCAI.

[23]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[24]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[25]  Philip Resnik,et al.  GIBBS SAMPLING FOR THE UNINITIATED , 2010 .

[26]  Sanjay Chawla,et al.  Inferring the Root Cause in Road Traffic Anomalies , 2012, 2012 IEEE 12th International Conference on Data Mining.

[27]  Nitesh V. Chawla,et al.  Inferring Unusual Crowd Events from Mobile Phone Call Detail Records , 2015, ECML/PKDD.

[28]  Zhongyi Hu,et al.  PSO-MISMO Modeling Strategy for MultiStep-Ahead Time Series Prediction , 2014, IEEE Transactions on Cybernetics.

[29]  Yanchi Liu,et al.  Diagnosing New York city's noises with ubiquitous data , 2014, UbiComp.

[30]  Jie Tang,et al.  Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs , 2015, IJCAI.

[31]  Xian Wu,et al.  Crowdsourcing-based Urban Anomaly Prediction System for Smart Cities , 2016, CIKM.

[32]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.