Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information

Machine learning algorithms (MLAs) are a powerful group of data-driven inference tools that offer an automated means of recognizing patterns in high-dimensional data. Hence, there is much scope for the application of MLAs to the rapidly increasing volumes of remotely sensed geophysical data for geological mapping problems. We carry out a rigorous comparison of five MLAs: Naive Bayes, k-Nearest Neighbors, Random Forests, Support Vector Machines, and Artificial Neural Networks, in the context of a supervised lithology classification task using widely available and spatially constrained remotely sensed geophysical data. We make a further comparison of MLAs based on their sensitivity to variations in the degree of spatial clustering of training data, and their response to the inclusion of explicit spatial information (spatial coordinates). Our work identifies Random Forests as a good first choice algorithm for the supervised classification of lithology using remotely sensed geophysical data. Random Forests is straightforward to train, computationally efficient, highly stable with respect to variations in classification model parameter values, and as accurate as, or substantially more accurate than the other MLAs trialed. The results of our study indicate that as training data becomes increasingly dispersed across the region under investigation, MLA predictive accuracy improves dramatically. The use of explicit spatial information generates accurate lithology predictions but should be used in conjunction with geophysical data in order to generate geologically plausible predictions. MLAs, such as Random Forests, are valuable tools for generating reliable first-pass predictions for practical geological mapping applications that combine widely available geophysical data.

[1]  S. Fraser,et al.  Semiautomated geologic mapping using self-organizing maps and airborne geophysics in the Brazilian Amazon , 2012 .

[2]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[3]  J. R. Sveinsson,et al.  Mapping of hyperspectral AVIRIS data using machine-learning algorithms , 2009 .

[4]  Alexei Pozdnoukhov,et al.  Machine Learning for Spatial Environmental Data: Theory, Applications, and Software , 2009 .

[5]  D. Griffith Spatial Autocorrelation , 2020, Spatial Analysis Methods and Practice.

[6]  Giles M. Foody,et al.  A relative evaluation of multiclass image classification by support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[8]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[9]  Stephen R. Marsland,et al.  Machine Learning - An Algorithmic Perspective , 2009, Chapman and Hall / CRC machine learning and pattern recognition series.

[10]  Branislav Bajat,et al.  Geological Units Classification of Multispectral Images by Using Support Vector Machines , 2009, 2009 International Conference on Intelligent Networking and Collaborative Systems.

[11]  D. Leverington,et al.  Landsat-TM-Based Discrimination of Lithological Units Associated with the Purtuniq Ophiolite, Quebec, Canada , 2012, Remote. Sens..

[12]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[13]  Mark Gahegan,et al.  On the Application of Inductive Machine Learning Tools to Geographical Analysis , 2010 .

[14]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[15]  C. Lloyd Local Models for Spatial Analysis , 2006 .

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  Ian Witten,et al.  Data Mining , 2000 .

[18]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[19]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[20]  L. Anselin Local Indicators of Spatial Association—LISA , 2010 .

[21]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[22]  Mahesh Pal,et al.  Random forest classifier for remote sensing classification , 2005 .

[23]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[24]  Rafael Molina,et al.  Modern statistical techniques , 1995 .

[25]  W. J. Stroud,et al.  The early Proterozoic Willyama supergroup: Stratigraphic subdivision and interpretation of high to low‐grade metamorphic rocks in the Broken Hill Block, New South Wales , 1983 .

[26]  Sukumar Bandopadhyay,et al.  An Objective Analysis of Support Vector Machine Based Classification for Remote Sensing , 2008 .

[27]  Xiaoguang Jiang,et al.  Comparison of artificial neural networks and support vector machine classifiers for land cover classification in Northern China using a SPOT-5 HRG image , 2012 .

[28]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[29]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[30]  Chin-Teng Lin,et al.  A Spatial–Contextual Support Vector Machine for Remotely Sensed Image Classification , 2012, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Björn Waske,et al.  Classifier ensembles for land cover mapping using multitemporal SAR imagery , 2009 .

[32]  B. Stevens,et al.  Correlation of Olary and Broken Hill Domains, Curnamona Province: Possible Relationship to Mount Isa and Other North Australian Pb-Zn-Ag-Bearing Successions , 2005 .

[33]  Ae Webster,et al.  The structural evolution of the Broken Hill Pb-Zn-Ag deposit, New South Wales, Australia , 2004 .

[34]  B. Stevens,et al.  Post‐depositional history of the Willyama Supergroup in the Broken Hill Block, NSW , 1986 .

[35]  Joydeep Ghosh,et al.  Investigation of the random forest framework for classification of hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[36]  Lorenzo Bruzzone,et al.  Classification of hyperspectral remote sensing images with support vector machines , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[37]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .

[38]  Russell G. Congalton,et al.  Assessing the accuracy of remotely sensed data : principles and practices , 1998 .

[39]  Pavel Paclík,et al.  The ROC skeleton for multiclass ROC estimation , 2010, Pattern Recognit. Lett..

[40]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[41]  L. S. Davis,et al.  An assessment of support vector machines for land cover classi(cid:142) cation , 2002 .

[42]  Qihao Weng,et al.  A survey of image classification methods and techniques for improving classification performance , 2007 .

[43]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[44]  Isabelle Guyon ClopiNet A practical guide to model selection , 2009 .

[45]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[46]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[47]  Isabelle Guyon,et al.  Practical Feature Selection: from Correlation to Causality , 2007, NATO ASI Mining Massive Data Sets for Security.

[48]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[49]  Le Yu,et al.  Towards automatic lithological classification from remote sensing data using support vector machines , 2010, Comput. Geosci..

[50]  David W. Leverington,et al.  Please Scroll down for Article International Journal of Remote Sensing Discrimination of Sedimentary Lithologies Using Hyperion and Landsat Thematic Mapper Data: a Case Study at Melville Island, Canadian High Arctic Discrimination of Sedimentary Lithologies Using Hyperion and Landsat Thematic Mapper , 2022 .

[51]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[52]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[53]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[54]  Matthew J. Cracknell,et al.  The upside of uncertainty: Identification of lithology contact zones from airborne geophysics and satellite data using random forests and support vector machines , 2013 .

[55]  B. Stevens,et al.  Geochronology of the Sequence Hosting the Broken Hill Pb-Zn-Ag Orebody,Australia , 2005 .