A multicriteria optimization framework for the definition of the spatial granularity of urban social media analytics

ABSTRACT The spatial analysis of social media data has recently emerged as a significant source of knowledge for urban studies. Most of these analyses are based on an areal unit that is chosen without the support of clear criteria to ensure representativeness with regard to an observed phenomenon. Nonetheless, the results and conclusions that can be drawn from a social media analysis to a great extent depend on the areal unit chosen, since they are faced with the well-known Modifiable Areal Unit Problem. To address this problem, this article adopts a data-driven approach to determine the most suitable areal unit for the analysis of social media data. Our multicriteria optimization framework relies on the Pareto optimality to assess candidate areal units based on a set of user-defined criteria. We examine a case study that is used to investigate rainfall-related tweets and to determine the areal units that optimize spatial autocorrelation patterns through the combined use of indicators of global spatial autocorrelation and the variance of local spatial autocorrelation. The results show that the optimal areal units (30 km2 and 50 km2) provide more consistent spatial patterns than the other areal units and are thus likely to produce more reliable analytical results.

[1]  Dong-Ling Xu,et al.  An introduction and survey of the evidential reasoning approach for multiple criteria decision analysis , 2012, Ann. Oper. Res..

[2]  Matthew Zook,et al.  Using Geotagged Digital Social Data in Geographic Research , 2014 .

[3]  Louis J. Battan,et al.  Radar Observation of the Atmosphere , 1973 .

[4]  W. Tobler Frame independent spatial analysis , 1989 .

[5]  Bin Jiang,et al.  The Evolution of Natural Cities from the Perspective of Location-Based Social Media , 2014, Digital Social Networks and Travel Behaviour in Urban Environments.

[6]  Pilvi Nummi,et al.  Social Media Data Analysis in Urban e-Planning , 2017 .

[7]  Alexandre C. B. Delbem,et al.  Does keyword noise change over space and time? A case study of social media messages , 2018, GEOINFO.

[8]  Ate Poorthuis,et al.  How to Draw a Neighborhood? The Potential of Big Data, Regionalization, and Community Detection for Understanding the Heterogeneous Nature of Urban Neighborhoods , 2018 .

[9]  S. Dark,et al.  The modifiable areal unit problem (MAUP) in physical geography , 2007 .

[10]  Pablo Martí,et al.  Social Media data: Challenges, opportunities and limitations in urban studies , 2019, Comput. Environ. Urban Syst..

[11]  Matthias Ehrgott,et al.  Multiple criteria decision analysis: state of the art surveys , 2005 .

[12]  Bin Jiang,et al.  A Fractal Perspective on Scale in Geography , 2016, ISPRS Int. J. Geo Inf..

[13]  Daniel B. Carr,et al.  Hexagon Mosaic Maps for Display of Univariate and Bivariate Geographical Data , 1992 .

[14]  Alexander Zipf,et al.  A local scale-sensitive indicator of spatial autocorrelation for assessing high- and low-value clusters in multiscale datasets , 2015, Int. J. Geogr. Inf. Sci..

[15]  João Porto de Albuquerque,et al.  Geo-social media as a proxy for hydrometeorological data for streamflow estimation and to improve flood monitoring , 2018, Comput. Geosci..

[16]  Henrikki Tenkanen,et al.  Instagram, Flickr, or Twitter: Assessing the usability of social media data for visitor monitoring in protected areas , 2017, Scientific Reports.

[17]  S Openshaw,et al.  Algorithms for Reengineering 1991 Census Geography , 1995, Environment & planning A.

[18]  Kiyun Yu,et al.  Method for Determining Appropriate Clustering Criteria of Location-Sensing Data , 2016, ISPRS Int. J. Geo Inf..

[19]  Qunying Huang,et al.  Understanding social media data for disaster management , 2015, Natural Hazards.

[20]  S. Openshaw Ecological Fallacies and the Analysis of Areal Census Data , 1984, Environment & planning A.

[21]  Bin Jiang,et al.  Geospatial analysis requires a different way of thinking: the problem of spatial heterogeneity , 2015 .

[22]  M. Ehrgott Multiobjective Optimization , 2008, AI Mag..

[23]  Stan Openshaw,et al.  An Empirical Study of Some Zone-Design Criteria , 1978 .

[24]  Paul A. Longley,et al.  Geo-temporal Twitter demographics , 2016, Int. J. Geogr. Inf. Sci..

[25]  Alexander Zipf,et al.  A geographic approach for combining social media and authoritative data towards identifying useful information for disaster management , 2015, Int. J. Geogr. Inf. Sci..

[26]  Yan Meng,et al.  Scale selection based on Moran's I for segmentation of high resolution remotely sensed images , 2014, 2014 IEEE Geoscience and Remote Sensing Symposium.

[27]  Stan Openshaw,et al.  A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling , 1977 .

[28]  Murugesu Sivapalan,et al.  Scale issues in hydrological modelling: A review , 1995 .

[29]  Alexander Zipf,et al.  An Advanced Systematic Literature Review on Spatiotemporal Analyses of Twitter Data , 2015, Trans. GIS.

[30]  Michał Rzeszewski,et al.  Geosocial capta in geographical research – a critical analysis , 2018 .

[31]  Luke S. Smith,et al.  Assessing the utility of social media as a data source for flood risk management using a real‐time modelling framework , 2017 .

[32]  Alexandre C. B. Delbem,et al.  Mining Rainfall Spatio-Temporal Patterns in Twitter: A Temporal Approach , 2017, AGILE Conf..

[33]  Diansheng Guo,et al.  A novel approach to leveraging social media for rapid flood mapping: a case study of the 2015 South Carolina floods , 2018 .

[34]  A S Fotheringham,et al.  The Modifiable Areal Unit Problem in Multivariate Statistical Analysis , 1991 .

[35]  Carlos M. Fonseca,et al.  Inferential Performance Assessment of Stochastic Optimisers and the Attainment Function , 2001, EMO.

[36]  Stewart Fotheringham,et al.  Scale-independent spatial analysis , 1989 .

[37]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[38]  T. Cheng,et al.  Modifiable Temporal Unit Problem (MTUP) and Its Effect on Space-Time Cluster Detection , 2014, PloS one.

[39]  Forrest R. Stevens,et al.  Improving Large Area Population Mapping Using Geotweet Densities , 2016, Trans. GIS.

[40]  María Martínez-Rojas,et al.  Twitter as a tool for the management and analysis of emergency situations: A systematic literature review , 2018, Int. J. Inf. Manag..

[41]  Rudy Arthur,et al.  Social sensing of floods in the UK , 2017, PloS one.

[42]  Matthias Ehrgott,et al.  Multiple Criteria Decision Analysis , 2016 .

[43]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[44]  Patrick Roocks,et al.  Computing Pareto Frontiers and Database Preferences with the rPref Package , 2016, R J..

[45]  Qunying Huang,et al.  Geographic Situational Awareness: Mining Tweets for Disaster Preparedness, Emergency Response, Impact, and Recovery , 2015, ISPRS Int. J. Geo Inf..

[46]  J. Fowler,et al.  Rapid assessment of disaster damage using social media activity , 2016, Science Advances.

[47]  M. Batty The New Science of Cities , 2013 .

[48]  Alexander Zipf,et al.  Twitter as an indicator for whereabouts of people? Correlating Twitter with UK census data , 2015, Comput. Environ. Urban Syst..

[49]  D. Leibovici,et al.  Rapid flood inundation mapping using social media, remote sensing and topographic data , 2017, Natural Hazards.

[50]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .