Game theory interpretation of digital soil mapping convolutional neural networks

Abstract. The use of complex models such as deep neural networks has yielded large improvements in predictive tasks in many fields including digital soil mapping. Once of the concerns about using these models is that they are perceived as black boxes with low interpretability. In this paper we introduce the use of game theory, specifically SHAP values, in order to interpret a digital soil mapping model. SHAP values represent the contribution of a covariate to the final model predictions. We applied this method to a multi-task convolutional neural network trained to predict soil organic carbon of Chile. The results show the contribution of each covariate to the model predictions in three different contexts: (a) at a local level, showing the contribution of the various covariates for a single prediction, (b) a global understanding of the covariate contribution, and (c) a spatial interpretation of their contributions. The latter constitutes a novel application of SHAP values and also the first detailed analysis of a model in a spatial context. The analysis of a SOC model in Chile corroborated that the model is capturing sensible relationships between SOC and rainfall, temperature, elevation, slope and topographic wetness index. The results agree with commonly reported relationships, highlighting environmental thresholds that coincide with significant areas within the study area. This contribution addresses the limitations of the current interpretation of models in digital soil mapping, especially in a spatial context. We believe that SHAP values are a valuable tool that should be included within the DSM framework since they address the important concerns regarding the interpretability of more complex models. The model interpretation is a crucial step that could lead to generating new knowledge to improve our understanding of soils.

[1]  Lu Zhang,et al.  From machine learning to deep learning: progress in machine intelligence for rational drug discovery. , 2017, Drug discovery today.

[2]  Alán Aspuru-Guzik,et al.  Deep learning enables rapid identification of potent DDR1 kinase inhibitors , 2019, Nature Biotechnology.

[3]  Michael Bock,et al.  System for Automated Geoscientific Analyses (SAGA) v. 2.1.4 , 2015 .

[4]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Alfred E. Hartemink,et al.  Total soil organic carbon and carbon sequestration potential in Nigeria , 2016 .

[6]  Christopher P. McKay,et al.  Changes in the soil C cycle at the arid‐hyperarid transition in the Atacama Desert , 2008 .

[7]  Thorsten Behrens,et al.  Teleconnections in spatial modelling , 2019, Geoderma.

[8]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[9]  J. Baldock,et al.  Importance of mechanisms and processes of the stabilisation of soil organic matter for modelling carbon turnover. , 2003, Functional plant biology : FPB.

[10]  Budiman Minasny,et al.  Chile and the Chilean soil grid: A contribution to GlobalSoilMap , 2017 .

[11]  Manuel Casanova,et al.  The Soils of Chile , 2013 .

[12]  Budiman Minasny,et al.  Using deep learning for digital soil mapping , 2018, SOIL.

[13]  Blandine Lemercier,et al.  Mapping soil organic carbon stock change by soil monitoring and digital soil mapping at the landscape scale , 2019, Geoderma.

[14]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[15]  S. K. Singh,et al.  Spatial prediction of major soil properties using Random Forest techniques - A case study in semi-arid tropics of South India , 2017 .

[16]  L. S. Shapley,et al.  17. A Value for n-Person Games , 1953 .

[17]  Hany Farid,et al.  The accuracy, fairness, and limits of predicting recidivism , 2018, Science Advances.

[18]  Karim El Mokhtari,et al.  Interpreting financial time series with SHAP values , 2019, CASCON.

[19]  Chaopeng Shen,et al.  A Transdisciplinary Review of Deep Learning Research and Its Relevance for Water Resources Scientists , 2017, Water Resources Research.

[20]  Tom Drummond,et al.  A review of deep learning in the study of materials degradation , 2018, npj Materials Degradation.

[21]  R. Webster,et al.  Baseline map of organic carbon in Australian soil to support national carbon accounting and monitoring under climate change , 2014, Global Change Biology.

[22]  L. Shapley A Value for n-person Games , 1988 .

[23]  Lalit Kumar,et al.  Digital soil mapping algorithms and covariates for soil organic carbon mapping and their implications: A review , 2019, Geoderma.

[24]  Scott M. Lundberg,et al.  Explainable machine-learning predictions for the prevention of hypoxaemia during surgery , 2018, Nature Biomedical Engineering.

[25]  Wolfgang Leiniger,et al.  Games and information: An introduction to game theory: Eric Rasmusen, (Basil Blackwell, Oxford, 1989) , 1991 .

[26]  Budiman Minasny,et al.  Machine learning and soil sciences: a review aided by machine learning tools , 2020 .

[27]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[28]  S L Patil,et al.  Effect of in-situ moisture conservation practices and integrated nutrient management on nutrient availability and grain yield of rabi sorghum (Sorghum bicolor) in the Vertisols of semi-arid tropics of south India , 2001 .

[29]  Dominique Arrouays,et al.  Spatial distribution of soil organic carbon stocks in France , 2010 .

[30]  Sean Ekins,et al.  Exploiting machine learning for end-to-end drug discovery and development , 2019, Nature Materials.

[31]  K. Verdin,et al.  New Global Hydrography Derived From Spaceborne Elevation Data , 2008 .

[32]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[33]  Ali Movahedi,et al.  Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. , 2019, Accident; analysis and prevention.

[34]  Sarah Webb Deep learning for biology , 2018, Nature.

[35]  Margaret G. Schmidt,et al.  Predictive soil parent material mapping at a regional-scale: a Random Forest approach. , 2014 .

[36]  Seth Flaxman,et al.  European Union Regulations on Algorithmic Decision-Making and a "Right to Explanation" , 2016, AI Mag..

[37]  Daniël Wedema Games And Information An Introduction To Game Theory 3rd Edition , 2011 .

[38]  Syed Muhammad Anwar,et al.  Medical Image Analysis using Convolutional Neural Networks: A Review , 2017, Journal of Medical Systems.

[39]  Kevin E. Trenberth,et al.  Progress during TOGA in understanding and modeling global teleconnections associated with tropical sea surface temperatures , 1998 .

[40]  E. Rasmusen Games and Information: An Introduction to Game Theory , 2006 .

[41]  Karin Viergever,et al.  Knowledge discovery from models of soil properties developed through data mining , 2006 .

[42]  M. Wiesmeier,et al.  Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem , 2011, Plant and Soil.