Combining Graph Clustering and Quantitative Association Rules for Knowledge Discovery in Geochemical Data Problem

Identifying geochemical patterns from backgrounds and generating associated mineralization remains challenging due to the complex structure of mineral deposits. To learn how to identify geochemical anomalies that are spatially associated with mineralization, we need in-depth knowledge of the dependence process. Quantitative association rules (QARs) are applied to discover remarkable relations and dependencies between attributes in a dataset, but it is difficult to generate relationships from geochemical data. In previous studies, no methodology to find association rules is proposed to deal with geochemical data problem, and the classical methods designed for Boolean and nominal attributes require previous discretization, which makes the whole process limited in processing complex data. In this paper, we proposed a hybrid method of graph clustering and quantitative association rules (GCQAR) as a new way of identifying significant geochemical patterns. Graph Clustering (GC) is used as partitioning paradigm because of its ability to handle large-scale datasets. The GC is based on modularity to effectively generate the groups of the graph, to avoid the over-partitioning, and to cover all the rules. In each partition, a set of geochemical quantitative association rules is produced. The results obtained in the experimental study performed on data collected in the field of Xiaoshan, Henan province, China. Our GCQAR has significant benefits in terms of recognition geochemical patterns compared to the traditional methods used in the field of geochemistry.

[1]  Ardeshir Hezarkhani,et al.  Detecting homogenous clusters using whole-rock chemical compositions and REE patterns: A graph-based geochemical approach , 2016 .

[2]  Mustansar Ali Ghazanfar,et al.  Frequent Pattern Mining Algorithms for Finding Associated Frequent Patterns for Data Streams: A Survey , 2014, EUSPN/ICTH.

[3]  J. Hein,et al.  Mineralization at Oceanic Transform Faults and Fracture Zones , 2019, Transform Plate Boundaries and Fracture Zones.

[4]  Witold Pedrycz,et al.  Unsupervised Learning: Clustering , 2007 .

[5]  S. Carlino,et al.  A geophysical k -means cluster analysis of the Solfatara-Pisciarelli volcano-geothermal system, Campi Flegrei (Naples, Italy) , 2017, Journal of Applied Geophysics.

[6]  A. Buccianti,et al.  Weathering reactions and isometric log-ratio coordinates: Do they speak to each other? , 2016 .

[7]  Abbas Bahroudi,et al.  Support vector machine for multi-classification of mineral prospectivity areas , 2012, Comput. Geosci..

[8]  S. Regenspurg,et al.  Self-organizing maps in geothermal exploration–A new approach for understanding geochemical processes and fluid evolution , 2017 .

[9]  Brian S. Penn,et al.  Using self-organizing maps to visualize high-dimensional data , 2005, Comput. Geosci..

[10]  Y. F. Alghalandis,et al.  The application of geochemical pattern recognition to regional prospecting: A case study of the Sana , 2011 .

[11]  Q. Cheng,et al.  Fractal/multifractal modelling of geochemical exploration data , 2012 .

[12]  Yongliang Chen,et al.  Application of continuous restricted Boltzmann machine to identify multivariate geochemical anomaly , 2014 .

[13]  Q. Cheng,et al.  The separation of geochemical anomalies from background by fractal methods , 1994 .

[14]  Emmanuel John M. Carranza,et al.  Random forest predictive modeling of mineral prospectivity with small number of prospects and data with missing values in Abra (Philippines) , 2015, Comput. Geosci..

[15]  Yulei Huang,et al.  An Effective Algorithm for Mining Quantitative Association Rules Based on High Dimension Cluster , 2008, 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing.

[16]  P. K. Mishra,et al.  Performance optimization of MapReduce-based Apriori algorithm on Hadoop cluster , 2017, Comput. Electr. Eng..

[17]  Alicia Troncoso Lora,et al.  Selecting the best measures to discover quantitative association rules , 2014, Neurocomputing.

[18]  Wei Xiong,et al.  Evaluation of Flotation Working Condition Recognition Based on An Improved Apriori Algorithm , 2018 .

[19]  Guillaume Caumon,et al.  Curvature Attribute from Surface-Restoration as Predictor Variable in Kupferschiefer Copper Potentials , 2015, Natural Resources Research.

[20]  Changjiang Li,et al.  Application of a fractal method relating concentrations and distances for separation of geochemical anomalies from background , 2003 .

[21]  V. Ojala,et al.  Spatial Analysis Techniques as Successful Mineral-Potential Mapping Tools for Orogenic Gold Deposits in the Northern Fennoscandian Shield, Finland , 2007 .

[22]  Peng Gang Sun,et al.  Complete graph model for community detection , 2017 .

[23]  G. Perrault,et al.  Distribution of gold, arsenic, antimony and tungsten around the Dest-Or Orebody, Noranda district, Abitibi, Quebec , 1987 .

[24]  Ruifang Liu,et al.  Weighted Graph Clustering for Community Detection of Large Social Networks , 2014, ITQM.

[25]  Yu Wang,et al.  Application of hierarchical clustering, singularity mapping, and Kohonen neural network to identify Ag-Au-Pb-Zn polymetallic mineralization associated geochemical anomaly in Pangxidong district , 2019, Journal of Geochemical Exploration.

[26]  Francisco Herrera,et al.  QAR-CIP-NSGA-II: A new multi-objective evolutionary algorithm to mine quantitative association rules , 2014, Inf. Sci..

[27]  Miao Zhang,et al.  Research of Improved FP-Growth Algorithm in Association Rules Mining , 2015, Sci. Program..

[28]  Francky Fouedjio,et al.  A hierarchical clustering method for multivariate geostatistical data , 2016 .

[29]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[30]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[31]  J. Orozco-Alcalá,et al.  Inactivity is a risk factor for low bone mineral density among haemophilic children , 2008, British journal of haematology.

[32]  Francisco Herrera,et al.  A New Multiobjective Evolutionary Algorithm for Mining a Reduced Set of Interesting Positive and Negative Quantitative Association Rules , 2014, IEEE Transactions on Evolutionary Computation.

[33]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[34]  Alicia Troncoso Lora,et al.  Obtaining optimal quality measures for quantitative association rules , 2016, Neurocomputing.

[35]  M. Engle,et al.  The isometric log-ratio (ilr)-ion plot: A proposed alternative to the Piper diagram , 2018, Journal of Geochemical Exploration.

[36]  Jesús Alcalá-Fdez,et al.  Analysis of the Effectiveness of the Genetic Algorithms based on Extraction of Association Rules , 2010, Fundam. Informaticae.

[37]  Santo Fortunato,et al.  Community detection in graphs , 2009, ArXiv.

[38]  Ansaf Salleb-Aouissi,et al.  QuantMiner for mining quantitative association rules , 2013, J. Mach. Learn. Res..

[39]  E. C. Grunsky,et al.  The differentiation of soil types and mineralization from multi-element geochemistry using multivariate methods and digital topography , 1999 .

[40]  James A. Anderson,et al.  A simple neural network generating an interactive memory , 1972 .

[41]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[42]  Renguang Zuo,et al.  Recognition of geochemical anomalies using a deep autoencoder network , 2016, Comput. Geosci..

[43]  Erhan Akin,et al.  An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules , 2006, Soft Comput..

[44]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[45]  Jian Wang,et al.  Deep learning and its application in geochemical mapping , 2019, Earth-Science Reviews.

[46]  Chengqi Zhang,et al.  Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support , 2009, Expert Syst. Appl..

[47]  Vera Pawlowsky-Glahn,et al.  Spatial analysis of compositional data: A historical review , 2016 .

[48]  U. Mueller,et al.  Using surface regolith geochemistry to map the major crustal blocks of the Australian continent , 2017 .

[49]  Swarup Roy,et al.  Trends in quantitative association rule mining techniques , 2015, 2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS).

[50]  Fan Yang,et al.  Integration of auto-encoder network with density-based spatial clustering for geochemical anomaly detection for mineral exploration , 2019, Comput. Geosci..

[51]  Jiye Wang,et al.  FP-Growth based Regular Behaviors Auditing in Electric Management Information System , 2018, ITQM.

[52]  Jens Feder,et al.  The fractal nature of geochemical landscapes , 1992 .

[53]  Chris Hankin,et al.  Multi-scale Community Detection using Stability as Optimisation Criterion in a Greedy Algorithm , 2011, KDIR.

[54]  Siddique Latif,et al.  Community detection in networks: A multidisciplinary review , 2018, J. Netw. Comput. Appl..

[55]  P. Filzmoser,et al.  Normal and lognormal data distribution in geochemistry: death of a myth. Consequences for the statistical treatment of geochemical and environmental data , 2000 .

[56]  Li Dancheng,et al.  A New Approach of Self-adaptive Discretization to Enhance the Apriori Quantitative Association Rule Mining , 2012, 2012 Second International Conference on Intelligent System Design and Engineering Application.

[57]  Hongfang Zhou,et al.  A graph clustering method for community detection in complex networks , 2017 .

[58]  Binbin He,et al.  A method for mineral prospectivity mapping integrating C4.5 decision tree, weights-of-evidence and m-branch smoothing techniques: a case study in the eastern Kunlun Mountains, China , 2014, Earth Science Informatics.

[59]  Jesús Alcalá-Fdez,et al.  MRQAR: A generic MapReduce framework to discover quantitative association rules in big data problems , 2018, Knowl. Based Syst..

[60]  Q. Cheng,et al.  Integrated Spatial and Spectrum Method for Geochemical Anomaly Separation , 2000 .

[61]  S. Shen,et al.  The statistical analysis of compositional data , 1983 .

[62]  Henrik Jeldtoft Jensen,et al.  Comparison of Communities Detection Algorithms for Multiplex , 2014, ArXiv.

[63]  David B. Smith,et al.  A modified procedure for mixture-model clustering of regional geochemical data , 2014 .

[64]  Bin Wu,et al.  A link clustering based overlapping community detection algorithm , 2013, Data Knowl. Eng..

[65]  Arun N. Swami,et al.  Set-oriented mining for association rules in relational databases , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[66]  Antonella Buccianti,et al.  Compositional data analysis in geochemistry: Are we sure to see what really occurs during natural processes? , 2014 .

[67]  Chen Shou-yu THE PROGRAMMING OF DRILLING LOG DRAWINGSYSTEM BASED ON MAPGIS , 2004 .

[68]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[69]  Alex A. Freitas,et al.  Evolutionary Computation , 2002 .

[70]  Chaosheng Zhang,et al.  The application of Local Moran's I to identify spatial clusters and hot spots of Pb, Mo and Ti in urban soils of Yerevan , 2019, Applied Geochemistry.

[71]  Yasuhiko Morimoto,et al.  Mining optimized association rules for numeric attributes , 1996, J. Comput. Syst. Sci..

[72]  Keith C. C. Chan,et al.  An effective algorithm for mining interesting quantitative association rules , 1997, SAC '97.

[73]  W. Shi,et al.  Robust variogram estimation combined with isometric log-ratio transformation for improved accuracy of soil particle-size fraction mapping , 2018, Geoderma.

[74]  Mitica Craus,et al.  Grid implementation of the Apriori algorithm , 2007, Adv. Eng. Softw..