A machine learning approach to geochemical mapping

Geochemical maps provide invaluable evidence to guide decisions on issues of mineral exploration, agriculture, and environmental health. However, the high cost of chemical analysis means that the ground sampling density will always be limited. Traditionally, geochemical maps have been produced through the interpolation of measured element concentrations between sample sites using models based on the spatial autocorrelation of data (e.g. semivariogram models for ordinary kriging). In their simplest form such models fail to consider potentially useful auxiliary information about the region and the accuracy of the maps may suffer as a result. In contrast, this study uses quantile regression forests (an elaboration of random forest) to investigate the potential of high resolution auxiliary information alone to support the generation of accurate and interpretable geochemical maps. This paper presents a summary of the performance of quantile regression forests in predicting element concentrations, loss on ignition and pH in the soils of south west England using high resolution remote sensing and geophysical survey data. Through stratified 10-fold cross validation we find the accuracy of quantile regression forests in predicting soil geochemistry in south west England to be a general improvement over that offered by ordinary kriging. Concentrations of immobile elements whose distributions are most tightly controlled by bedrock lithology are predicted with the greatest accuracy (e.g. Al with a cross-validated R2 of 0.79), while concentrations of more mobile elements prove harder to predict. In addition to providing a high level of prediction accuracy, models built on high resolution auxiliary variables allow for informative, process based, interpretations to be made. In conclusion, this study has highlighted the ability to map and understand the surface environment with greater accuracy and detail than previously possible by combining information from multiple datasets. As the quality and coverage of remote sensing and geophysical surveys continue to improve, machine learning methods will provide a means to interpret the otherwise-uninterpretable.

[1]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[2]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[3]  Matthew J. Cracknell,et al.  Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information , 2014, Comput. Geosci..

[4]  Martha C. Anderson,et al.  Landsat-8: Science and Product Vision for Terrestrial Global Change Research , 2014 .

[5]  R. Shail,et al.  The Rhenohercynian passive margin of SW England : development, inversion and extensional reactivation , 2009 .

[6]  P. Moran Notes on continuous stochastic phenomena. , 1950, Biometrika.

[7]  J. Eriksson,et al.  Agricultural soils in Northern Europe: a geochemical atlas. , 2003 .

[8]  Wang Xueqiu,et al.  Geochemical exploration for gold: a new approach to an old problem , 1991 .

[9]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[10]  H. E. Hawkes,et al.  Geochemistry in Mineral Exploration , 1962 .

[11]  F. Fordyce Selenium Deficiency and Toxicity in the Environment , 2013 .

[12]  Rick L. Lawrence,et al.  Mapping invasive plants using hyperspectral imagery and Breiman Cutler classifications (RandomForest) , 2006 .

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[15]  I. Thornton Environmental geochemistry and health in the 1990s: a global perspective , 1993 .

[16]  Noel A Cressie,et al.  Spatial prediction and ordinary kriging , 1988 .

[17]  B. Charoy The Genesis of the Cornubian Batholith (South-West England): the example of the Carnmenellis Pluton , 1986 .

[18]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[19]  J. Willis-Richards,et al.  Evolution of the Cornubian ore field, Southwest England; Part I, Batholith modeling and ore distribution , 1989 .

[20]  D. A. Green,et al.  A colour scheme for the display of astronomical intensity images , 2011, 1108.5083.

[21]  Edzer J. Pebesma,et al.  Real-time automatic interpolation of ambient gamma dose rates from the Dutch radioactivity monitoring network , 2009, Comput. Geosci..

[22]  M. Styles,et al.  The igneous rocks of south-west England , 1993 .

[23]  A. A. Levinson INTRODUCTION TO EXPLORATION GEOCHEMISTRY , 1974 .

[24]  Qiuming Cheng,et al.  A fractal filtering technique for processing regional geochemical maps for mineral exploration , 2001, Geochemistry: Exploration, Environment, Analysis.

[25]  V. Pawlowsky-Glahn,et al.  Compositional data analysis : theory and applications , 2011 .

[26]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[27]  N. Breward,et al.  G-BASE: baseline geochemical mapping of Great Britain and Northern Ireland , 2005, Geochemistry: Exploration, Environment, Analysis.

[28]  A. A. Beus,et al.  Geochemical exploration methods for mineral deposits , 1977 .

[29]  P. Smedley The geochemistry of rare earth elements in groundwater from the Carnmenellis area, southwest England , 1991 .

[30]  P. O'Connor,et al.  FOREGS GEOCHEMICAL MAPPING FIELD MANUAL , 1998 .

[31]  Johannes R. Sveinsson,et al.  Random Forests for land cover classification , 2006, Pattern Recognit. Lett..

[32]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[33]  P. Potts,et al.  Rare earth element mobility during granite alteration: Evidence from southwest England , 1980 .

[34]  Emmanuel John M. Carranza,et al.  Random forest predictive modeling of mineral prospectivity with small number of prospects and data with missing values in Abra (Philippines) , 2015, Comput. Geosci..

[35]  William N. Venables,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[36]  J. D. Appleton,et al.  Regional geochemical mapping in developing countries and its application to environmental studies , 1993 .

[37]  Jeffrey G. White,et al.  Mapping soil micronutrients , 1999 .

[38]  I. Thornton,et al.  Arsenic and heavy metals in soils associated with regional geochemical anomalies in South-West England , 1975 .

[39]  Mario Chica-Olmo,et al.  An assessment of the effectiveness of a random forest classifier for land-cover classification , 2012 .

[40]  A. G. Darnley International geochemical mapping: a new global project , 1990 .

[41]  B. J. Alloway,et al.  Heavy metals in soils , 1990 .

[42]  V. Rodriguez-Galiano,et al.  Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines , 2015 .

[43]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[44]  M. Wiesmeier,et al.  Digital mapping of soil organic matter stocks using Random Forest modeling in a semi-arid steppe ecosystem , 2011, Plant and Soil.

[45]  J. Harris,et al.  Data- and knowledge-driven mineral prospectivity maps for Canada's North , 2015 .

[46]  J. Evans,et al.  Modeling Species Distribution and Change Using Random Forest , 2011 .

[47]  R. Reíd,et al.  Soil chemistry and mineral problems in farm livestock. A review , 1980 .

[48]  G. Kirby The Lizard complex as an ophiolite , 1979, Nature.

[49]  Charlie Kirkwood,et al.  Stream sediment geochemistry as a tool for enhancing geological understanding: An overview of new data from south west England , 2016 .

[50]  Jun Wang,et al.  Comparison of random forest, support vector machine and back propagation neural network for electronic tongue data classification: Application to the recognition of orange beverage and Chinese vinegar , 2013 .

[51]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[52]  I. Thornton,et al.  Regional geochemical mapping and health in the United Kingdom , 1980, Journal of the Geological Society.

[53]  B. Henderson,et al.  Australia-wide predictions of soil properties using decision trees , 2005 .

[54]  D. Brus,et al.  A comparison of kriging, co-kriging and kriging combined with regression for spatial interpolation of horizon depth with censored observations , 1995 .

[55]  Guoyi Zhang,et al.  Bias-corrected random forests in regression , 2012 .