Improving the Veracity of Open and Real-Time Urban Data

Within the context of the smart city, data are an integral part of the digital economy and are used as input for decision making, policy formation, and to inform citizens, city managers and commercial organisations. Reflecting on our experience of developing real-world software applications which rely heavily on urban data, this article critically examines the veracity of such data (their authenticity and the extent to which they accurately (precision) and faithfully (fidelity, reliability) represent what they are meant to) and how they can be assessed in the absence of quality reports from data providers. While data quality needs to be considered at all aspects of the data lifecycle and in the development and use of applications, open data are often provided ‘as-is’ with no guarantees about their veracity, continuity or lineage (documentation that establishes provenance and fit for use). This allows data providers to share data with undocumented errors, absences, and biases. If left unchecked these data quality issues can propagate through multiple systems and lead to poor smart city applications and unreliable ‘evidence-based’ decisions. This leads to a danger that open government data portals will come to be seen as untrusted, unverified and uncurated data-dumps by users and critics. Drawing on our own experiences we highlight the process we used to detect and handle errors. This work highlights the necessary janitorial role carried out by data scientists and developers to ensure that data are cleaned, parsed, validated and transformed for use. This important process requires effort, knowledge, skill and time and is often hidden in the resulting application and is not shared with other data users. In this paper, we propose that rather than lose this knowledge, in the absence of data providers documenting them in metadata and user guides, data portals should provide a crowdsourcing mechanism to generate and record user observations and fixes for improving the quality of urban data and open government portals.

[1]  Tracey P. Lauriault,et al.  Crowdsourcing: A Geographic Approach to Public Engagement , 2014 .

[2]  Michael Batty,et al.  Big Data and the City , 2016 .

[3]  Hojung Cha,et al.  Micro Sensor Node for Air Pollutant Monitoring: Hardware and Software Issues , 2009, Sensors.

[4]  Jürgen Umbrich,et al.  Towards assessing the quality evolution of Open Data portals , 2015 .

[5]  Gavin McArdle,et al.  City-scale traffic simulation from digital footprints , 2012, UrbComp '12.

[6]  R. Kitchin,et al.  Crowdsourced Cartography: Mapping Experience and Knowledge , 2013 .

[7]  Nancy K. Baym,et al.  Data not seen: The uses and shortcomings of social media metrics , 2013, First Monday.

[8]  Stefan van der Spek,et al.  Classifying pedestrian movement behaviour from GPS trajectories using visualization and clustering , 2014, Ann. GIS.

[9]  Shawn Turner,et al.  Defining and Measuring Traffic Data Quality: White Paper on Recommended Approaches , 2004 .

[10]  Katleen Janssen Open Government Data: Right to Information 2.0 or its Rollback Version? , 2012 .

[11]  Matthew Zook,et al.  Beyond the geotag: situating ‘big data’ and leveraging the potential of the geoweb , 2013 .

[12]  Daniel Krajzewicz,et al.  SUMO - Simulation of Urban MObility An Overview , 2011 .

[13]  J. Crampton Beyond the Geotag ? Deconstructing “ Big Data ” and Leveraging the Potential of the Geoweb , 2012 .

[14]  Ning Wang,et al.  Assessing the Bias in Communication Networks Sampled from Twitter , 2012, ArXiv.

[15]  Rob Kitchin,et al.  Solutions, Strategies and Frictions in Civic Hacking , 2015 .

[16]  Carlo Batini,et al.  Methodologies for data quality assessment and improvement , 2009, CSUR.

[17]  S. Turner DEFINING AND MEASURING TRAFFIC DATA QUALITY , 2002 .

[18]  M. Goodchild Citizens as sensors: the world of volunteered geography , 2007 .

[19]  Tracey P. Lauriault,et al.  Knowing and governing cities through urban indicators, city benchmarking and real-time dashboards , 2015 .

[20]  Michael F. Goodchild,et al.  Spatial Data Quality , 2002 .

[21]  Jo Bates,et al.  "This is what modern deregulation looks like" : co-optation and contestation in the shaping of the UK's Open Government Data Initiative , 2012, J. Community Informatics.

[22]  R. Kitchin,et al.  Unfolding mapping practices: a new epistemology for cartography , 2013 .

[23]  Marcel Rieser,et al.  Adding Transit to an Agent-Based Transportation Simulation: Concepts and Implementation , 2010 .

[24]  Rufus Pollock,et al.  The Value of the Public Domain , 2006 .

[25]  Francesco Calabrese,et al.  Real-Time Social Event Analytics , 2015 .

[26]  S. Fotheringham,et al.  The Atlas of the Island of Ireland: Mapping Social and Economic Change , 2008 .

[27]  Toyotaro Suzumura,et al.  A high performance multi-modal traffic simulation platform and its case study with the Dublin city , 2015, 2015 Winter Simulation Conference (WSC).

[28]  Arturo Haro de Rosario,et al.  An International Analysis of the Quality of Open Government Data Portals , 2016 .

[29]  E. G. Coleman,et al.  Hacker practice , 2008 .

[30]  Gavin McArdle,et al.  Using Digital Footprints for a City-Scale Traffic Simulation , 2014, TIST.

[31]  S. Guptill,et al.  Elements of Spatial Data Quality , 1995 .

[32]  Axel Bruns,et al.  Faster than the speed of print: Reconciling 'big data' social media analysis and academic scholarship , 2013, First Monday.