Research and application of cluster and association analysis in geochemical data processing

For applications of data mining techniques in geosciences, through mining spatial databases which are constructed with geophysical and geochemical data measured in fields, critical knowledge, such as the spatial distribution of geological targets, the geophysical and geochemical characteristics of geological targets, the differentiation among the geological targets, and the relationship among geophysical and geochemical data, can be discovered. Due to the complexity of geophysical and geochemical data, traditional mining methods of cluster analysis and association analysis have limitations in processing complex data. In this paper, a clustering algorithm based on density and adaptive density-reachable is presented which has the ability to handle clusters of arbitrary shapes, sizes, and densities. For association analysis, mining the continuous attributes may reveal useful and interesting insights about the data objects in geoscientific applications. An approach for distance-based quantitative association analysis is presented in this paper. Experiments and applications indicate that the algorithm and approach are effective in real-world applications.

[1]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[2]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[3]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[4]  Mohamed S. Kamel,et al.  Finding Natural Clusters Using Multi-clusterer Combiner Based on Shared Nearest Neighbors , 2003, Multiple Classifier Systems.

[5]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[6]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[7]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[8]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[9]  Hai-Dong Meng,et al.  Research and Implementation of Clustering Algorithm for Arbitrary Clusters , 2008, 2008 International Conference on Computer Science and Software Engineering.

[10]  Daniel A. Keim,et al.  On Knowledge Discovery and Data Mining , 1997 .

[11]  Pat Langley,et al.  Models of Incremental Concept Formation , 1990, Artif. Intell..

[12]  Vipin Kumar,et al.  Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data , 2003, SDM.

[13]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[14]  Jiawei Han,et al.  CLARANS: A Method for Clustering Objects for Spatial Data Mining , 2002, IEEE Trans. Knowl. Data Eng..

[15]  Gregory M. P. O'Hare,et al.  Research and Application of Clustering Algorithm for Arbitrary Data Set , 2008, 2008 International Conference on Computer Science and Software Engineering.

[16]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[17]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[18]  Gregory M. P. O’Hare,et al.  The application of cluster analysis in geophysical data interpretation , 2010 .

[19]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[20]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[21]  Douglas H. Fisher,et al.  Improving Inference through Conceptual Clustering , 1987, AAAI.

[22]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[23]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.