Minimum Collection Period for Viable Population Estimation from Social Media

Using volunteered geographic information for population estimation has shown promise in the fields of urban planning, emergency response and disaster recovery. A high volume of geospatially enabled Tweets can be leveraged to create population curves and/or heatmaps delineated by day of week and hour of day. When making these estimations, it is critical to have adequate data, or the confidence of the estimations will be low. This is especially pertinent to disaster response, where Tweet collection for a new city/town/locale may need to be rapidly deployed. Using previously leveraged data removal methods, temporal data quantity is explored using sets of data from increasingly longer collection periods. When generating these estimates, it is also necessary to identify and mitigate data from automated Twitter bots. This work examines the integration of a modern, web services based, Twitter bot assessment algorithm, executes data removal experiments on collected data, describes the technical architecture, and discusses results/follow-on work.

[1]  Keith C. Clarke,et al.  Interactive Visual Exploration of a Large Spatio-temporal Dataset: Reflections on a Geovisualization Mashup. , 2007, IEEE Transactions on Visualization and Computer Graphics.

[2]  Bin Jiang,et al.  Crowdsourcing, Citizen Science or Volunteered Geographic Information? The Current State of Crowdsourced Geographic Information , 2016, ISPRS Int. J. Geo Inf..

[3]  James Caverlee,et al.  How big is the crowd?: event and location based population modeling in social media , 2013, HT.

[4]  Emmanouel A. Varvarigos,et al.  Event Detection in Twitter Microblogging , 2016, IEEE Transactions on Cybernetics.

[5]  Filippo Menczer,et al.  Online Human-Bot Interactions: Detection, Estimation, and Characterization , 2017, ICWSM.

[6]  Amos Azaria,et al.  The DARPA Twitter Bot Challenge , 2016, Computer.

[7]  T. Chai,et al.  Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature , 2014 .

[8]  Matthew Zook,et al.  Using Geotagged Digital Social Data in Geographic Research , 2014 .

[9]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[10]  Paolo Dabove,et al.  GPS & GLONASS Mass-Market Receivers: Positioning Performances and Peculiarities , 2014, Sensors.

[11]  Seth Guikema,et al.  Methods for Estimating Population Density in Data-Limited Areas: Evaluating Regression and Tree-Based Models in Peru , 2014, PloS one.

[12]  Christoph Aubrecht,et al.  VGDI – Advancing the Concept: Volunteered Geo‐Dynamic Information and its Benefits for Population Dynamics Modeling , 2017, Trans. GIS.

[13]  Branislav Kusy,et al.  Mobility in cities: Comparative analysis of mobility models using Geo-tagged tweets in Australia , 2017, 2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)(.

[14]  Robert G. Cromley,et al.  Evaluating geo-located Twitter data as a control layer for areal interpolation of population , 2015 .

[15]  M. Goodchild,et al.  Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr , 2013 .

[16]  B. Lindsay Social Media and Disasters: Current Uses, Future Options, and Policy Considerations , 2011 .

[17]  Samuel Lee Toepke Data Density Considerations for Crowd Sourced Population Estimations from Social Media , 2017, GISTAM.

[18]  Amy N. Rose,et al.  The LandScan Global Population Distribution Project: Current State of the Art and Prospective Innovation , 2014 .

[19]  John Yen,et al.  Classifying text messages for the haiti earthquake , 2011, ISCRAM.