Geochemical Prospectivity Mapping Through a Feature Extraction–Selection Classification Scheme

AbstractMachine learning (ML) schemes can enhance success in geochemical prospectivity mapping. This study has examined the effectiveness of several feature extraction or selection approaches, using a variety of ML algorithms applied to multielement soil and lithogeochemical data, to identify new prospective Pb–Zn mineralisation in the Irankuh area. Singular value decomposition (SVD) was used as a dimensionality reduction technique to remove noise in the geochemical data. This was followed by application of feature selection techniques including filter-based methods such as principal component analysis (PCA), Pearson’s correlation coefficient (PCC), correlation-based feature selection (CFS), information gain ratio (IGR) and wrapper models, in combination with support vector machines, logistic regression and random forests analysis. The performance of the ML algorithms, assisted by feature extraction and selection methods, was subsequently assessed using a 10-fold cross-validation of separate training and testing data subsets. SVD boosted the performance of support vector machines, logistic regression and random forests. The ML algorithms are particularly effective when using two transformed principal components that are linked to a suite of elements associated with the sulphide mineralisation and variations in the host lithologies. PCA and PCC techniques generally suit support vector machines as the most effective feature selection methods. Logistic regression provided a better classification with PCA, IGR and a wrapper model. However, random forests delivered more accurate outcomes using PCA and PCC techniques. A geochemical prospectivity map of the study area has been derived from support vector machines, trained with two principal components as the best performing ML scheme, and has generated three new and distinct targets for more detailed exploration.

[1]  I. Jolliffe Principal Component Analysis , 2002 .

[2]  Richard W. Saltus,et al.  A deposit model for Mississippi Valley-Type lead-zinc ores: Chapter A in Mineral deposit models for resource assessment , 2010 .

[3]  V. Rodriguez-Galiano,et al.  Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines , 2015 .

[4]  Eric C. Grunsky,et al.  The interpretation of geochemical survey data , 2010 .

[5]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[6]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[7]  Robert H. McNutt,et al.  Genesis of sediment-hosted Zn-Pb-Ba deposits in the Irankuh District, Esfahan area, west-central Iran , 1994 .

[8]  E. Grunsky,et al.  Identification of sandstones above blind uranium deposits using multivariate statistical assessment of compositional data, Athabasca Basin, Canada , 2018 .

[9]  Ahmad Reza Mokhtari,et al.  Application of singular value decomposition (SVD) and semi-discrete decomposition (SDD) techniques in clustering of geochemical data: an environmental study in central Iran , 2016, Stochastic Environmental Research and Risk Assessment.

[10]  Ahmad Reza Mokhtari,et al.  Hydrothermal alteration mapping through multivariate logistic regression analysis of lithogeochemical data , 2014 .

[11]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[12]  A. Porwal,et al.  Weights-of-evidence and logistic regression modeling of magmatic nickel sulfide prospectivity in the Yilgarn Craton, Western Australia , 2010 .

[13]  E. Grunsky,et al.  Multielement statistical evidence for uraniferous hydrothermal activity in sandstones overlying the Phoenix uranium deposit, Athabasca Basin, Canada , 2018, Mineralium Deposita.

[14]  Clemens Reimann,et al.  Multivariate outlier detection in exploration geochemistry , 2005, Comput. Geosci..

[15]  Vipin Kumar UNDERSTANDING COMPLEX DATASETS: DATA MINING WITH MATRIX DECOMPOSITIONS , 2006 .

[16]  Renguang Zuo,et al.  Support vector machine: A tool for mapping mineral prospectivity , 2011, Comput. Geosci..

[17]  Samina Khalid,et al.  A survey of feature selection and feature extraction techniques in machine learning , 2014, 2014 Science and Information Conference.

[18]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[19]  Yongliang Chen,et al.  Mapping mineral prospectivity using an extreme learning machine regression , 2017 .

[20]  P. Filzmoser,et al.  Statistical Data Analysis Explained , 2008 .

[21]  Emmanuel John M. Carranza,et al.  Application of Discriminant Analysis and Support Vector Machine in Mapping Gold Potential Areas for Further Drilling in the Sari-Gunay Gold Deposit, NW Iran , 2016, Natural Resources Research.

[22]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[23]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[24]  E. Carranza,et al.  Logistic regression for geologically constrained mapping of gold potential, Baguio district, Philippines , 2001 .

[25]  Wilfried N. Gansterer,et al.  On the Relationship Between Feature Selection and Classification Accuracy , 2008, FSDM.

[26]  Q. Cheng,et al.  Application of singularity theory and logistic regression model for tungsten polymetallic potential mapping , 2013 .

[27]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[28]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[29]  Amparo Alonso-Betanzos,et al.  Filter Methods for Feature Selection - A Comparative Study , 2007, IDEAL.

[30]  E. Carranza Geochemical Anomaly and Mineral Prospectivity Mapping in Gis , 2012 .

[31]  Guocheng Pan,et al.  Mineral Favorability Mapping: A Comparison of Artificial Neural Networks, Logistic Regression, and Discriminant Analysis , 1999 .

[32]  E. Carranza,et al.  Selection of coherent deposit-type locations and their application in data-driven mineral prospectivity mapping , 2008 .

[33]  S. H. Tabatabaei,et al.  Objective based geochemical anomaly detection—Application of discriminant function analysis in anomaly delineation in the Kuh Panj porphyry Cu mineralization (Iran) , 2013 .

[34]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[37]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[38]  Philip H. W. Leong,et al.  Grammar-Based Feature Generation for Time-Series Prediction , 2015 .

[39]  K. Baker,et al.  Singular Value Decomposition Tutorial , 2013 .

[40]  M. Pal,et al.  Random forests for land cover classification , 2003, IGARSS 2003. 2003 IEEE International Geoscience and Remote Sensing Symposium. Proceedings (IEEE Cat. No.03CH37477).

[41]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[42]  Robert X. Gao,et al.  PCA-based feature selection scheme for machine defect classification , 2004, IEEE Transactions on Instrumentation and Measurement.

[43]  M. Shardlow An Analysis of Feature Selection Techniques , 2011 .

[44]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[45]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[46]  A. Aftabi,et al.  Composite soil-geochemical halos delineating carbonate-hosted zinc–lead–barium mineralization in the Irankuh district, Isfahan, west-central Iran , 2015 .

[47]  R. Zuo Machine Learning of Mineralization-Related Geochemical Anomalies: A Review of Potential Methods , 2017, Natural Resources Research.

[48]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[49]  A. Aftabi,et al.  Vertical lithogeochemical halos and zoning vectors at Goushfil Zn–Pb deposit, Irankuh district, southwestern Isfahan, Iran: Implications for concealed ore exploration and genetic models , 2016 .

[50]  Tom Gedeon,et al.  Artificial neural networks: A new method for mineral prospectivity mapping , 2000 .

[51]  A. Karegowda,et al.  COMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE SELECTION , 2010 .

[52]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[53]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[54]  J. Harris,et al.  Comparison of the Data-Driven Random Forests Model and a Knowledge-Driven Method for Mineral Prospectivity Mapping: A Case Study for Gold Deposits Around the Huritz Group and Nueltin Suite, Nunavut, Canada , 2016, Natural Resources Research.

[55]  Clemens Reimann,et al.  Statistical data analysis explained : applied environmental statics with R , 2008 .

[56]  Justin Granek,et al.  Application of machine learning algorithms to mineral prospectivity mapping , 2016 .

[57]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[58]  Emmanuel John M. Carranza,et al.  Supervised geochemical anomaly detection by pattern recognition , 2015 .

[59]  Earl Harris Information Gain Versus Gain Ratio: A Study of Split Method Biases , 2002, ISAIM.

[60]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[61]  Carles Canet,et al.  Metallogeny of Cretaceous carbonate-hosted Zn–Pb deposits of Iran: geotectonic setting and data integration for future mineral exploration , 2012 .

[62]  Ute Mueller,et al.  Environmental Monitoring and Peat Assessment Using Multivariate Analysis of Regional-Scale Geochemical Data , 2018, Mathematical Geosciences.