From Big Noise to Big Data: Toward the Verification of Large Data sets for Understanding Regional Retail Flows

There has been much excitement amongst quantitative geographers about newly available datasets, characterised by high volume, velocity and variety. This phenomenon is often labelled as 'Big Data' and has contributed to methodological and empirical advances, particularly in the areas of visualisation and analysis of social networks. However, a fourth v - veracity (or lack thereof) - has been conspicuously lacking from the literature. This paper sets out to test the potential for verifying large datasets. It does this by cross-comparing three unrelated estimates of retail flows --- human movements from home locations to shopping centres --- derived from the following geo-coded sources: 1) a major mobile telephone service provider; 2) a commercial consumer survey; and 3) geotagged Twitter messages. Three spatial interaction models also provided estimates of flow: constrained and unconstrained versions of the 'gravity model' and the recently developed 'radiation model'. We found positive relationships between all data-based and theoretical sources of estimated retail flows. Based on the analysis, the mobile telephone data fitted the modelled flows and consumer survey data closely, while flows obtained directly from the Twitter data diverged from other sources. The research highlights the importance of verification in flow data derived from new sources and demonstrates methods for achieving this.

[1]  David Coleman,et al.  The Twilight of the Census , 2013 .

[2]  N. Taleb Antifragile: Things That Gain from Disorder , 2012 .

[3]  Mark Birkin,et al.  Estimating Individual Behaviour from Massive Social Data for an Urban Agent-Based Model , 2012 .

[4]  Mark Birkin,et al.  Estimating Individual Behaviour for Massive Social Data , 2013 .

[5]  Christopher M. Danforth,et al.  The Geography of Happiness: Connecting Twitter Sentiment and Expression, Demographics, and Objective Characteristics of Place , 2013, PloS one.

[6]  V. Mitchell,et al.  The Role of Geodemographics in Segmenting and Targeting Consumer Markets , 1994 .

[7]  Andrew J Tatem,et al.  Integrating rapid risk mapping and mobile phone call record data for strategic malaria elimination planning , 2014, Malaria Journal.

[8]  A Wilson,et al.  The Use of Analogies in Geography , 2010 .

[9]  Terry L. Hunt,et al.  Scoring Points: How Tesco Continues to Win Customer Loyalty , 2004 .

[10]  R. Kitchin,et al.  Big data and human geography , 2013 .

[11]  Kord Davis Ethics of Big Data: Balancing Risk and Innovation , 2012 .

[12]  Bongsik Shin,et al.  Data quality management, data usage experience and acquisition intention of big data analytics , 2014, Int. J. Inf. Manag..

[13]  Alan Wilson,et al.  A statistical theory of spatial distribution models , 1967 .

[14]  L. Hajibayova,et al.  CRITICAL QUESTIONS FOR BIG DATA APPROACH IN KNOWLEDGE REPRESENTATION AND ORGANIZATION , 2017 .

[15]  F. N. David,et al.  Principles and procedures of statistics. , 1961 .

[16]  Robin Lovelace,et al.  Geotagged tweets to inform a spatial interaction model: a case study of museums , 2014, ArXiv.

[17]  Kevin Driscoll,et al.  Big Data, Big Questions| Working Within a Black Box: Transparency in the Collection and Production of Big Twitter Data , 2014 .

[18]  Margaret E. Roberts,et al.  Reverse-engineering censorship in China: Randomized experimentation and participant observation , 2014, Science.

[19]  Donald R. Davis,et al.  Spatial and Social Frictions in the City: Evidence from Yelp , 2015 .

[20]  Matthew Zook,et al.  Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb , 2013 .

[21]  Theo Aventze,et al.  Intelligent GIS: Location decisions and strategic planning , 1998 .

[22]  S. Stouffer Intervening opportunities: a theory relating mobility and distance , 1940 .

[23]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[24]  Hadley Wickham,et al.  ggmap: Spatial Visualization with ggplot2 , 2013, R J..

[25]  Hui Xiong,et al.  Geographical Analysis , 2008, Encyclopedia of GIS.

[26]  Marta C. González,et al.  A universal model for mobility and migration patterns , 2011, Nature.

[27]  A. Smeaton,et al.  On Using Twitter to Monitor Political Sentiment and Predict Election Results , 2011 .

[28]  Alex Singleton,et al.  Putting big data in its place: a Regional Studies and Regional Science perspective , 2015 .

[29]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[30]  Daniel Arribas-Bel,et al.  Accidental, open and everywhere: Emerging data sources for the understanding of cities , 2014 .

[31]  Alex Singleton,et al.  Geographers Count: A Report on Quantitative Methods in Geography , 2014 .

[32]  Martin Clarke,et al.  Refining and Operationalizing Entropy‐Maximizing Models for Business Applications. 商业应用模式下熵最大化模型的应用与改进 , 2010 .

[33]  Alan Wilson,et al.  A Family of Spatial Interaction Models, and Associated Developments , 1971 .

[34]  Chris Mattmann,et al.  Computing: A vision for data science , 2013, Nature.

[35]  Roberto Trasarti,et al.  Transportation Planning Based on GSM Traces: A Case Study on Ivory Coast , 2013, CitiSens.

[36]  John Stillwell,et al.  Exploring and Validating a Commercial Lifestyle Survey for its use in the Analysis of Population Migration , 2014 .

[37]  Samuel Madden,et al.  From Databases to Big Data , 2012, IEEE Internet Comput..

[38]  Michael Batty,et al.  A Generic Framework for Computational Spatial Modelling , 2012 .

[39]  M. Hilbert,et al.  Big Data for Development: From Information- to Knowledge Societies , 2013 .