Application of semi-supervised fuzzy c-means method in clustering multivariate geochemical data, a case study from the Dalli Cu-Au porphyry deposit in central Iran

Abstract Supervised and unsupervised learning methods are widely used to classify and cluster multivariate geochemical data. Supervised learning methods incorporate training functions to classify the geochemical data, whereas unsupervised learning methods extract hidden structures of the data and assign them to various clusters. A semi-supervised learning method is a hybrid learning method that simultaneously extracts the hidden structure of non-training data and uses training data to improve the clustering analysis. In this research, initially eleven soil geochemical variables associated with the Dalli Cu-Au porphyry deposit, located in the central part of Iran, were selected by using hieratical clustering analysis and expert knowledge. Then, the semi-supervised fuzzy c-means clustering method (ssFCM) was used to separate multivariate soil geochemical anomalies from background for further drilling. The results were compared with the fuzzy c-mean clustering (FCM) analysis applied to the same samples. The fundamental concept of the ssFCM method is similar to the widely used FCM method with the exception that the training data, in this case trenching data, were used as an objective function in the clustering analysis. The soil classification results were validated by using cluster validity indices, cross-validation and the uncertainty measurement. The validation results demonstrated that the ssFCM method was superior in classifying the multivariate soil geochemical data compared to the FCM method. For further validation, the membership values of the favorable classes identified by both FCM and ssFCM methods were converted to grid maps and compared with the spatial distribution of copper anomalies along the trenches and surface projection of the boreholes. This comparison suggests that the favorable multivariate soil geochemical anomalies identified by the ssFCM analysis correlate well with copper mineralization in rock channel and drill core samples.

[1]  Josef Kittler,et al.  Pattern Recognition Theory and Applications , 1987, NATO ASI Series.

[2]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[4]  S. H. Tabatabaei,et al.  Objective based geochemical anomaly detection—Application of discriminant function analysis in anomaly delineation in the Kuh Panj porphyry Cu mineralization (Iran) , 2013 .

[5]  Clemens Reimann,et al.  Factor analysis applied to regional geochemical data: problems and possibilities , 2002 .

[6]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[7]  Wenli Jiang,et al.  Objective function of semi-supervised Fuzzy C-Means clustering algorithm , 2008, 2008 6th IEEE International Conference on Industrial Informatics.

[8]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[9]  Witold Pedrycz,et al.  Algorithms of fuzzy clustering with partial supervision , 1985, Pattern Recognit. Lett..

[10]  Hongjin Ji,et al.  Correspondence cluster analysis and its application in exploration geochemistry , 1995 .

[11]  Weina Wang,et al.  On fuzzy cluster validity indices , 2007, Fuzzy Sets Syst..

[12]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[13]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[14]  H. H. Asadi,et al.  Petrology and geochemistry of calc-alkaline volcanic and subvolcanic rocks, Dalli porphyry copper–gold deposit, Markazi Province, Iran , 2013 .

[15]  Emmanuel John M. Carranza,et al.  Application of Discriminant Analysis and Support Vector Machine in Mapping Gold Potential Areas for Further Drilling in the Sari-Gunay Gold Deposit, NW Iran , 2016, Natural Resources Research.

[16]  E. Carranza Analysis and mapping of geochemical anomalies using logratio-transformed stream sediment data with c , 2011 .

[17]  Hai-Dong Meng,et al.  Research and application of cluster and association analysis in geochemical data processing , 2011 .

[18]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[19]  Daphne Teck Ching Lai,et al.  An exploration of improvements to semi-supervised fuzzy c-means clustering for real-world biomedical data , 2014 .

[20]  Boudewijn P. F. Lelieveldt,et al.  A new cluster validity index for the fuzzy c-mean , 1998, Pattern Recognit. Lett..

[21]  R. P. Chapman,et al.  Exploration geochemistry—Distribution of elements and recognition of anomalies , 1975 .

[22]  P. Filzmoser,et al.  Univariate Statistical Analysis of Environmental (compositional) Data: Problems and Possibilities , 2009 .

[23]  H. Pereira,et al.  A case study on geochemical anomaly identification through principal components analysis supplementary projection , 2003 .

[24]  Jonathan M. Garibaldi,et al.  A preliminary study on automatic breast cancer data classification using semi-supervised fuzzy c-means , 2013 .

[25]  Emmanuel John M. Carranza,et al.  Multivariate regression analysis of lithogeochemical data to model subsurface mineralization: A case study from the Sari Gunay epithermal gold deposit, NW Iran , 2015 .

[26]  R. Reyment,et al.  Statistics and Data Analysis in Geology. , 1988 .

[27]  G. Rantitsch Application of fuzzy clusters to quantify lithological background concentrations in stream-sediment geochemistry , 2000 .

[28]  D. L. Kelley,et al.  Major advances in exploration geochemistry, 1998–2007 , 2010 .

[29]  Witold Pedrycz,et al.  Fuzzy clustering with partial supervision , 1997, IEEE Trans. Syst. Man Cybern. Part B.

[30]  P.F.M. van Gaans,et al.  The application of fuzzy c-means cluster analysis and non-linear mapping to geochemical datasets: examples from Portugal , 1988 .

[31]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[32]  Witold Pedrycz,et al.  Data Clustering with Partial Supervision , 2005, Data Mining and Knowledge Discovery.

[33]  F. Darabi-Golestan,et al.  Alteration, zoning model, and mineralogical structure considering lithogeochemical investigation in Northern Dalli Cu–Au porphyry , 2013, Arabian Journal of Geosciences.

[34]  Witold Pedrycz,et al.  Knowledge-based clustering - from data to information granules , 2007 .

[35]  M. Goldhaber,et al.  Cluster analysis of a regional-scale soil geochemical dataset in northern California , 2011 .

[36]  Yongjun Lu,et al.  Exploration feature selection applied to hybrid data integration modeling: Targeting copper-gold potential in central Iran , 2015 .

[37]  G. Saffarini,et al.  Multivariate statistical techniques in geochemical exploration applied to Wadi sediments' data from an arid region: Wadi Dana, SW Jordan , 1992 .

[38]  K. Glennie Cretaceous Tectonic Evolution of Arabia's Eastern Plate Margin: A Tale of Two Oceans , 2000 .

[39]  T. Campbell McCuaig,et al.  Exploratory data analysis and C–A fractal model applied in mapping multi-element soil anomalies for drilling: A case study from the Sari Gunay epithermal gold deposit, NW Iran , 2014 .