Fourth paradigm GIScience? Prospects for automated discovery and explanation from data

ABSTRACT This article discusses the prospects for automated discovery of explanatory models directly from geospatial data. Rather than taking an approach based on machine learning, which generally leads to models that cannot be understood by humans or related to domain theory, the approach described here suggests we can instead construct models from fragments of domain understanding—such as commonly encountered equation forms, known constants and laws—resulting in discovered models that can both be understood by humans and directly compared with known theory. We then propose a conceptual model of the discovery process by which the various stages and components of discovery and explanation work together to learn models from data. The approach described weaves together ideas for describing models from Harvey’s book ‘Explanation in Geography’ with current thinking on how explanatory models might be ‘discovered’ from data from Inductive Process modeling. On the way, we also highlight: (i) why it is important to have models that explain as well as predict, (ii) how such an approach contrasts with – and goes beyond – current work in deep learning, (iii) how the task of model discovery might be tackled computationally and (iv) how computational model discovery can play a valuable role in creating geographical explanations.

[1]  L. Anselin,et al.  Modern Spatial Econometrics in Practice: A Guide to GeoDa, GeoDaSpace and PySAL , 2014 .

[2]  Jiaoyan Chen,et al.  DeepVGI: Deep Learning with Volunteered Geographic Information , 2017, WWW.

[3]  Manfred M. Fischer,et al.  Neural Spatial Interaction Models: Network Training, Model Complexity and Generalization Performance , 2013, ICCSA.

[4]  C. S. Holling Some Characteristics of Simple Types of Predation and Parasitism , 1959, The Canadian Entomologist.

[5]  A. Arvay Selective Induction of Rate-Based Process Models , 2016 .

[6]  T. Kuhn,et al.  The Structure of Scientific Revolutions. , 1964 .

[7]  Martin Charlton,et al.  Building a prototype Geographical Correlates Exploration Machine , 1990, Int. J. Geogr. Inf. Sci..

[8]  Michael J. Gerhardt The End of Theory , 2001 .

[9]  Emilio Casetti,et al.  Generating Models by the Expansion Method: Applications to Geographical Research* , 2010 .

[10]  Helmut Horacek,et al.  Requirements for Conceptual Representations of Explanations and How Reasoning Systems Can Serve Them , 2017 .

[11]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[12]  Christian Kray,et al.  Computational reproducibility in geoscientific papers: Insights from a series of studies with geoscientists and a reproduction study , 2018, Int. J. Geogr. Inf. Sci..

[13]  Xiao Xiang Zhu,et al.  Deep Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Arie Rip,et al.  The Computer Revolution in Science: Steps Towards the Realization of Computer-Supported Discovery Environments , 1997, Artif. Intell..

[15]  David J. Silverman,et al.  Doing Qualitative Research: A Practical Handbook , 1999 .

[16]  Johanna D. Moore,et al.  Generating and evaluating evaluative arguments , 2006, Artif. Intell..

[17]  van Paul Geert,et al.  Springer handbook of model-based science , 2016 .

[18]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[19]  Saso Dzeroski,et al.  Inductive process modeling , 2008, Machine Learning.

[20]  M. West,et al.  Shotgun Stochastic Search for “Large p” Regression , 2007 .

[21]  Pat Langley,et al.  Discovering Communicable Scientific Knowledge from Spatio-Temporal Data , 2001, ICML.

[22]  Tao Cheng,et al.  Advances in geocomputation (1996-2011) , 2012, Comput. Environ. Urban Syst..

[23]  Alexander Binder,et al.  Unmasking Clever Hans predictors and assessing what machines really learn , 2019, Nature Communications.

[24]  Mark Gahegan,et al.  Categories are in flux, but their computational representations are fixed: That's a problem , 2020, Trans. GIS.

[25]  Tony Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery , 2009 .

[26]  Adam Arvay,et al.  Computational Scientific Discovery Using Rate-Based Process Models , 2018 .

[27]  Guilherme Horta Travassos,et al.  Knowledge Engineering : a conceptual delineation and overview of the state of the art , 2016 .

[28]  David Harvey Explanation in Geography , 1969 .

[29]  Michael Dear,et al.  Geography’s Inner Worlds: Pervasive Themes in Contemporary American Geography. Ronald E. Abler, Melvin G. Marcus, and Judy M. Olson, eds.; Postmodemism, or the Cultural Logic of Late Capitalism. Fredric Jameson , 1992 .

[30]  Pat Langley,et al.  Data-Driven Discovery of Physical Laws , 1981, Cogn. Sci..

[31]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[32]  Stan Openshaw,et al.  Building an Automated Modeling System to Explore a Universe of Spatial Interaction Models , 2010 .

[33]  Stan Openshaw Developing Automated and Smart Spatial Pattern Exploration Tools for Geographical Information Systems Applications , 1995 .

[34]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[35]  A. Brenning Spatial prediction models for landslide hazards: review, comparison and evaluation , 2005 .

[36]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[37]  A. Pratt Putting critical realism to work: the practical implications for geographical research , 1995 .

[38]  W. R. Shankle,et al.  Acceptance by medical experts of rules generated by machine learning , 2001 .

[39]  Raúl E. Valdés-Pérez,et al.  Computer science research on scientific discovery , 1996, The Knowledge Engineering Review.

[40]  Saso Dzeroski,et al.  Computational Discovery of Scientific Knowledge , 2007, Computational Discovery of Scientific Knowledge.

[41]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[42]  Mark Gahegan Visual exploration and explanation in Geography: Analysis with Light , 2009 .

[43]  Boyan Brodaric,et al.  SKIing with DOLCE: toward an e-Science Knowledge Infrastructure , 2008, FOIS.

[44]  Martin Charlton,et al.  A Mark 1 Geographical Analysis Machine for the automated analysis of point data sets , 1987, Int. J. Geogr. Inf. Sci..

[45]  Shaowen Wang,et al.  CyberGIS - Toward synergistic advancement of cyberinfrastructure and GIScience: A workshop summary , 2012, J. Spatial Inf. Sci..

[46]  Liping Yang,et al.  Visually-Enabled Active Deep Learning for (Geo) Text and Image Classification: A Review , 2018, ISPRS Int. J. Geo Inf..

[47]  Leroy White,et al.  The end of theory , 1996 .

[48]  Qingshan Liu,et al.  Cascaded Recurrent Neural Networks for Hyperspectral Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[49]  Jiawei Han,et al.  Geographic Data Mining and Knowledge Discovery , 2001 .

[50]  S. R. Borrett,et al.  A method for representing and developing process models , 2006, q-bio/0605025.

[51]  Stephen D. Bay,et al.  Inductive revision of quantitative process models , 2006 .

[52]  Peter Norvig,et al.  Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[53]  M. Bednarek,et al.  1H NMR structural analysis of human ghrelin and its six truncated analogs. , 2001, Biopolymers.

[54]  S. Ustin,et al.  On timeliness and accuracy of wildfire detection by the GOES WF-ABBA algorithm over California during the 2006 fire season , 2012 .

[55]  Josef Kittler,et al.  Application of a Bayesian Network in a GIS Based Decision Making System , 1998, Int. J. Geogr. Inf. Sci..

[56]  M. Goodchild GIScience, Geography, Form, and Process , 2004 .

[57]  Mark Gahegan,et al.  Is inductive machine learning just another wild goose (or might it lay the golden egg)? , 2003, Int. J. Geogr. Inf. Sci..

[58]  Ljupco Todorovski,et al.  Equation discovery for systems biology: finding the structure and dynamics of biological networks from time course data. , 2008, Current opinion in biotechnology.

[59]  P. Langley,et al.  Computational Models of Scientific Discovery and Theory Formation , 1990 .

[60]  Judy M. Olson,et al.  Geography's Inner Worlds: Pervasive Themes in Contemporary American Geography , 1992 .

[61]  Pat Langley,et al.  Heuristic Induction of Rate-Based Process Models , 2015, AAAI.

[62]  Robyn Henderson,et al.  Doing qualitative research: a practical handbook , 2011 .

[63]  Michael F. Goodchild,et al.  Towards a general theory of geographic representation in GIS , 2007, Int. J. Geogr. Inf. Sci..

[64]  Fernand Gobet,et al.  Computational Scientific Discovery , 2017 .

[65]  Fred Kniffen,et al.  On Becoming a Geographer , 1983 .

[66]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[67]  Ramalingam Shanmugam,et al.  Model-based geostatistics for global public health: methods and applications , 2019, Journal of Statistical Computation and Simulation.

[68]  Mark Gahegan,et al.  Beyond Tools: Visual Support for the Entire Process of GIScience , 2005 .

[69]  Ross D. King,et al.  Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases , 2015, Journal of The Royal Society Interface.

[70]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[71]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.