Big Issues for Big Data

In this paper we consider some of the issues of working with big data and big spatial data and highlight the need for an open and critical framework. We focus on a set of challenges underlying the collection and analysis of big data. In particular, we consider 1) the issues related to inference when working with usually biased big data, challenging the assumed inferential superiority of data with observations, n, approaching N, the population (n->N), and the need for data science analysis that answer questions of practical significance or with greater emphasis n the size of the effect, rather than the truth or falsehood of a statistical statement; 2) the need to accept messiness in your data and to document all operations undertaken on the data because of this support of openness and reproducibility paradigms; and 3) the need to explicitly seek to understand the causes of bias, messiness etc in the data and the inferential consequences of using such data in analyses, by adopting critical approaches to spatial data science. In particular we consider the need to place individual data science studies in a wider social and economic contexts, along the the role of inferential theory in the presence of big data, and issues relating to messiness and complexity in big data.

[1]  R. Kitchin,et al.  Big Data, new epistemologies and paradigm shifts , 2014, Big Data Soc..

[2]  R. Wilby,et al.  A 250‐year drought catalogue for the island of Ireland (1765–2015) , 2017 .

[3]  D. Howden Models. Behaving. Badly: Why Confusing Illusion with Reality Can Lead to Disaster on Wall Street and in Life , 2012 .

[4]  Conor Murphy,et al.  Integrating Data Rescue into the Classroom , 2017, Bulletin of the American Meteorological Society.

[5]  R. Johnston,et al.  The Application of Factor Analysis in Human Geography , 1974 .

[6]  H. Randy Gimblett,et al.  Integrating geographic information systems and agent-based modeling techniques for simulating social and ecological processes , 2001 .

[7]  Shivanand Balram,et al.  Integrating Geographic Information Systems and Agent-Based Modeling Techniques for Simulating Social and Ecological Processes , 2003, The Professional Geographer.

[8]  Craig M Dalton,et al.  Critical Data Studies: A dialog on data and space , 2016 .

[9]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[10]  Xiao-Li Meng,et al.  Statistical paradises and paradoxes in big data (I): Law of large populations, big data paradox, and the 2016 US presidential election , 2018, The Annals of Applied Statistics.

[11]  Jose D. Perezgonzalez,et al.  Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing , 2015, Front. Psychol..

[12]  G. Gigerenzer Mindless statistics , 2004 .

[13]  Federica Russo,et al.  Critical data studies: An introduction , 2016, Big Data Soc..

[14]  Robert Weibel,et al.  Geographic Data Science , 2017, IEEE Computer Graphics and Applications.

[15]  Kara H. Woo,et al.  Data Organization in Spreadsheets , 2018 .

[16]  Colin Fay,et al.  Text Mining with R: A Tidy Approach , 2018 .

[17]  Brian J. L. Berry,et al.  APPROACHES TO REGIONAL ANALYSIS: A SYNTHESIS , 1964 .

[18]  J. Perezgonzalez A reconceptualization of significance testing , 2014 .

[19]  C. A. Moser,et al.  Survey Methods in Social Investigation , 1958 .

[20]  Xiao Cheng,et al.  The rise of the Big Data , 2013 .

[21]  W. S. Robinson Ecological correlations and the behavior of individuals. , 1950, International journal of epidemiology.

[22]  E. S. Pearson,et al.  ON THE USE AND INTERPRETATION OF CERTAIN TEST CRITERIA FOR PURPOSES OF STATISTICAL INFERENCE PART I , 1928 .

[23]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data with R , 2020, Use R!.

[24]  Rob Kitchin,et al.  Towards Critical Data Studies: Charting and Unpacking Data Assemblages and Their Work , 2014 .

[25]  Rachel Schutt,et al.  Doing Data Science , 2013 .

[26]  Chris Brunsdon,et al.  Opening practice: supporting reproducibility and critical spatial data science , 2020, Journal of Geographical Systems.

[27]  Q. Ethan McCallum Bad Data Handbook , 2012 .