Opening practice: supporting reproducibility and critical spatial data science

This paper reflects on a number of trends towards a more open and reproducible approach to geographic and spatial data science over recent years. In particular, it considers trends towards Big Data, and the impacts this is having on spatial data analysis and modelling. It identifies a turn in academia towards coding as a core analytic tool, and away from proprietary software tools offering ‘black boxes’ where the internal workings of the analysis are not revealed. It is argued that this closed form software is problematic and considers a number of ways in which issues identified in spatial data analysis (such as the MAUP) could be overlooked when working with closed tools, leading to problems of interpretation and possibly inappropriate actions and policies based on these. In addition, this paper considers the role that reproducible and open spatial science may play in such an approach, taking into account the issues raised. It highlights the dangers of failing to account for the geographical properties of data, now that all data are spatial (they are collected somewhere), the problems of a desire for $$n$$  = all observations in data science and it identifies the need for a critical approach. This is one in which openness, transparency, sharing and reproducibility provide a mantra for defensible and robust spatial data science.

[1]  Alexis J. Comber,et al.  Considering spatiotemporal processes in big data analysis: Insights from remote sensing of land cover and land use , 2019, Trans. GIS.

[2]  R. Kitchin,et al.  Big data and human geography , 2013 .

[3]  Richard A. Wadsworth,et al.  Probabilistic latent semantic analysis as a potential method for integrating spatial data concepts , 2008 .

[4]  W. S. Robinson Ecological correlations and the behavior of individuals. , 1950, International journal of epidemiology.

[5]  Xiao-Li Meng,et al.  Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election , 2018, The Annals of Applied Statistics.

[6]  Martijn Tennekes,et al.  tmap: Thematic Maps in R , 2018 .

[7]  Elizabeth E. Joh Feeding the Machine: Policing, Crime Data, & Algorithms , 2017 .

[8]  Marc P. Armstrong,et al.  ChoroWare: A Software Toolkit for Choropleth Map Classification , 2006 .

[9]  K. Levy,et al.  When open data is a Trojan Horse: The weaponization of transparency in science and governance , 2016 .

[10]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[11]  Daniel Nüst,et al.  CODECHECK: An open-science initiative to facilitate sharing of computer programs and results presented in scientific publications , 2019, Septentrio Conference Series.

[12]  Atsuyuki Okabe,et al.  SANET: A Toolbox for Spatial Analysis on a Network , 2006 .

[13]  A. Stewart Fotheringham,et al.  Reproducibility and Replicability in Geographical Analysis , 2019, Geographical Analysis.

[14]  Edzer Pebesma,et al.  Applied Spatial Data Analysis with R. Springer , 2008 .

[15]  Federica Russo,et al.  Critical data studies: An introduction , 2016, Big Data Soc..

[16]  Brian J. L. Berry,et al.  APPROACHES TO REGIONAL ANALYSIS: A SYNTHESIS , 1964 .

[17]  Jan de Leeuw,et al.  Reproducible Research: the Bottom Line , 2001 .

[18]  Nick Barnes Publish your computer code: it is good enough , 2010, Nature.

[19]  Craig M. Dalton,et al.  Inflated granularity: Spatial “Big Data” and geodemographics , 2015, Big Data Soc..

[20]  Daniel S. Katz,et al.  Software citation principles , 2016, PeerJ Comput. Sci..

[21]  Yihui Xie,et al.  R Markdown , 2018 .

[22]  Somayeh Dodge,et al.  A Data Science Framework for Movement , 2019, Geographical Analysis.

[23]  Chris Brunsdon,et al.  Establishing a framework for Open Geographic Information science , 2016, Int. J. Geogr. Inf. Sci..

[24]  Richard J. Fateman,et al.  A Review of Mathematica , 1992, J. Symb. Comput..

[25]  Cathy O'Neil,et al.  Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2016, Vikalpa: The Journal for Decision Makers.

[26]  Alex Singleton,et al.  An introduction to R for spatial analysis and mapping, by Chris Brunsdon and Lex Comber, London, Sage Publications Ltd., 2015, 360 pp., AU$92.00, NZ$94.78 (paperback), ISBN 9781446272954/AU$238.00, NZ$269.74 (hardback), ISBN 9781446272947 , 2016, Int. J. Geogr. Inf. Sci..

[27]  Rachel Schutt,et al.  Doing Data Science: Straight Talk from the Frontline , 2013 .

[28]  Sergio J. Rey,et al.  STARS: Space-Time Analysis of Regional Systems , 2004 .

[29]  Rob Kitchin,et al.  What makes Big Data, Big Data? Exploring the ontological characteristics of 26 datasets , 2016, Big Data Soc..

[30]  Edzer Pebesma,et al.  Simple Features for R: Standardized Support for Spatial Vector Data , 2018, R J..

[31]  Peter F. Fisher,et al.  Semantics, Metadata, Geographical Information and Users , 2008, Trans. GIS.

[32]  Md Nasir Sulaiman,et al.  Data stream clustering by divide and conquer approach based on vector model , 2015, Journal of Big Data.

[33]  Yihui Xie,et al.  bookdown: Authoring Books and Technical Documents with R Markdown , 2016 .

[34]  Craig M Dalton,et al.  Critical Data Studies: A dialog on data and space , 2016 .

[35]  G. Mason,et al.  Bias Crime Policing: 'The Graveyard Shift' , 2019, International Journal for Crime, Justice and Social Democracy.

[36]  Youngihn Kho,et al.  GeoDa: An Introduction to Spatial Data Analysis , 2006 .

[37]  Ben Baumer,et al.  R Markdown: Integrating A Reproducible Analysis Tool into Introductory Statistics , 2014, 1402.1894.

[38]  Peter F. Fisher,et al.  Integrating land-cover data with different ontologies: identifying change from inconsistency , 2004, Int. J. Geogr. Inf. Sci..

[39]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[40]  Elizabeth E. Joh Artificial Intelligence and Policing: First Questions , 2018 .

[41]  Robert Weibel,et al.  Geographic Data Science , 2017, IEEE Computer Graphics and Applications.

[42]  Max Kuhn,et al.  caret: Classification and Regression Training , 2015 .

[43]  Daniel Nüst,et al.  Reproducible research and GIScience: an evaluation using AGILE conference papers , 2018, PeerJ.

[44]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[45]  Robin Lovelace,et al.  Geocomputation with R , 2019 .

[46]  Rob Kitchin,et al.  Towards Critical Data Studies: Charting and Unpacking Data Assemblages and Their Work , 2014 .

[47]  Tomoki Nakaya,et al.  GWR 4 . 09 User Manual GWR 4 Windows Application for Geographically Weighted Regression Modelling , 2012 .

[48]  R. Johnston,et al.  The Application of Factor Analysis in Human Geography , 1974 .

[49]  Shaowen Wang CyberGIS and spatial data science , 2016 .

[50]  Edzer J. Pebesma,et al.  Applied Spatial Data Analysis with R - Second Edition , 2008, Use R!.

[51]  N. Levine Crime Mapping and the CrimeStat Program , 2006 .

[52]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[53]  Javier Solana,et al.  Big Data: A Revolution that Will Transform How We Work, Live and Think , 2014 .

[54]  Yunfeng Zhang,et al.  Think Your Artificial Intelligence Software Is Fair? Think Again , 2019, IEEE Software.

[55]  Hanan Samet,et al.  Extending the SAND Spatial Database System for the Visualization of Three‐Dimensional Scientific Data , 2006 .

[56]  Daniel Arribas-Bel,et al.  Geographic Data Science , 2019, Geographical Analysis.

[57]  Michael F. Goodchild,et al.  Towards a general theory of geographic representation in GIS , 2007, Int. J. Geogr. Inf. Sci..

[58]  A. Stewart Fotheringham,et al.  Geographically Weighted Regression: A Method for Exploring Spatial Nonstationarity , 2010 .

[59]  R. Kitchin,et al.  Big Data, new epistemologies and paradigm shifts , 2014, Big Data Soc..

[60]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[61]  Tony Doyle,et al.  Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2017, Inf. Soc..

[62]  R. Kallet,et al.  How to write the methods section of a research paper. , 2004, Respiratory care.

[63]  Roger Bivand,et al.  Implementing Spatial Data Analysis Software Tools in R , 2006 .

[64]  S. Openshaw Ecological Fallacies and the Analysis of Areal Census Data , 1984, Environment & planning A.

[65]  Sergio J. Rey,et al.  Recent Advances in Software for Spatial Analysis in the Social Sciences , 2006 .

[66]  Harris Mateen Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy , 2018 .

[67]  Closing the Climategate , 2010, Nature.

[68]  Jerry Shannon,et al.  Opening GIScience: A process-based approach , 2018, Int. J. Geogr. Inf. Sci..

[69]  Z. Merali Computational science: ...Error , 2010, Nature.