Analyzing and Predicting Spatial Crime Distribution Using Crowdsourced and Open Data

Data analytics has an ever increasing impact on tackling various societal challenges. In this article, we investigate how data from several heterogeneous online sources can be used to discover insights and make predictions about the spatial distribution of crime in large urban environments. A series of important research questions is addressed, following a purely data-driven approach and methodology. First, we examine how useful different types of data are for the task of crime levels prediction, focusing especially on how prediction accuracy can be improved by combining data from multiple information sources. To that end, we not only investigate prediction accuracy across all individual areas studied, but also examine how these predictions affect the accuracy of identified crime hotspots. Then, we look into individual features, aiming to identify and quantify the most important factors. Finally, we drill down to different crime types, elaborating on how the prediction accuracy and the importance of individual features vary across them. Our analysis involves six different datasets, from which more than 3,000 features are extracted, filtered, and used to learn models for predicting crime rates across 14 different crime categories. Our results indicate that combining data from multiple information sources can significantly improve prediction accuracy. They also highlight which features affect prediction accuracy the most, as well as for which particular crime categories the predictions are more accurate.

[1]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[2]  Joshua B. Plotkin,et al.  Spatiotemporal correlations in criminal offense records , 2011, TIST.

[3]  Hiroshi Motoda,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998, The Springer International Series in Engineering and Computer Science.

[4]  Abebe Rorissa,et al.  A comparative study of Flickr tags and index terms in a general image collection , 2010, J. Assoc. Inf. Sci. Technol..

[5]  Sabine Storandt,et al.  Fine-grained population estimation , 2015, SIGSPATIAL/GIS.

[6]  Jerry H. Ratcliffe,et al.  The Hotspot Matrix: A Framework for the Spatio‐Temporal Targeting of Crime Reduction , 2004 .

[7]  Donna R. Tabangin,et al.  Investigating Crime Hotspot Places and their Implication to Urban Environmental Design: A Geographic Visualization and Data Mining Approach , 2008 .

[8]  Wei Ding,et al.  Crime Forecasting Using Spatio-temporal Pattern with Ensemble Learning , 2014, PAKDD.

[9]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[10]  Andrea L. Bertozzi,et al.  Crime topic modeling , 2017, Crime Science.

[11]  Xiaowen Yang,et al.  The geography of violence, alcohol outlets, and drug arrests in Boston. , 2013, American journal of public health.

[12]  S. Chainey,et al.  Mapping Crime: Understanding Hot Spots , 2014 .

[13]  Leslie W. Kennedy,et al.  Risk Terrain Modeling: Brokering Criminological Theory and GIS Methods for Crime Forecasting , 2011 .

[14]  George E. Tita,et al.  Self-Exciting Point Process Modeling of Crime , 2011 .

[15]  David A. Shamma,et al.  The New Data and New Challenges in Multimedia Research , 2015, ArXiv.

[16]  Jing Yang,et al.  Learning functional compositions of urban spaces with crowd-augmented travel survey data , 2015, SIGSPATIAL/GIS.

[17]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[18]  Andrea L. Bertozzi,et al.  Randomized Controlled Field Trials of Predictive Policing , 2015 .

[19]  Chang-Tien Lu,et al.  A spatio-temporal-textual crime search engine , 2010, GIS '10.

[20]  Jeremy D. Barnum,et al.  Risk Terrain Modeling for Spatial Risk Assessment , 2015 .

[21]  Alex Pentland,et al.  Once Upon a Crime: Towards Crime Prediction from Demographics and Mobile Data , 2014, ICMI.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Christopher M. Gifford,et al.  Fuzzy association rule mining for community crime pattern discovery , 2010, ISI-KDD '10.

[24]  Richard Keith Wortley,et al.  Environmental criminology and crime analysis , 2008 .

[25]  Matthew S. Gerber,et al.  Predicting crime using Twitter and kernel density estimation , 2014, Decis. Support Syst..

[26]  David S. Ebert,et al.  Proactive Spatiotemporal Resource Allocation and Predictive Visual Analytics for Community Policing and Law Enforcement , 2014, IEEE Transactions on Visualization and Computer Graphics.

[27]  Martin A. Andresen,et al.  Exploring the impact of ambient population measures on London crime hotspots , 2016 .

[28]  J. Ord,et al.  Local Spatial Autocorrelation Statistics: Distributional Issues and an Application , 2010 .

[29]  Andromachi Tseloni,et al.  Fear of crime, perceived disorders and property crime: a multivariate analysis at the area level [In: Farrell, G., Bowers, K., Johnson, S.D. and Townsley, M., eds., Imagination for crime prevention: essays in honor of Ken Pease] , 2007 .

[30]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[31]  Xiaofeng Wang,et al.  Automatic Crime Prediction Using Events Extracted from Twitter Posts , 2012, SBP.

[32]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[33]  E. Lehmann,et al.  Nonparametrics: Statistical Methods Based on Ranks , 1976 .

[34]  Spencer Ch The Utility of Hotspot Mapping for Predicting Spatial Patterns of Crime , 2008 .

[35]  Wei Ding,et al.  Crime Forecasting Using Data Mining Techniques , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.