KNN-kernel based clustering for spatio-temporal database

Extracting and analyzing the interesting patterns from spatio-temporal databases, have drawn a great interest in various fields of research. Recently, a number of experiments have explored the problem of spatial or temporal data mining, and some clustering algorithms have been proposed. However, not many studies have been dealing with the integration of spatial data mining and temporal data mining. Moreover, the data in spatial temporal database can be categorized as high-dimensional data. Current density-based clustering might have difficulties with complex data sets including high-dimensional data. This paper presents Iterative Local Gaussian Clustering (ILGC), an algorithm that combines K-nearest neighbour (KNN) density estimation and Kernel density estimation, to cluster the spatiotemporal data. In this approach, the KNN density estimation is extended and combined with Kernel function, where KNN contributes in determining the best local data iteratively for kernel density estimation. The local best is defined as the set of neighbour data that maximizes the kernel function. Bayesian rule is used to deal with the problem of selecting the best local data. This paper utilized Gaussian kernel which has been proven successful in the clustering. To validate the KNN-kernel based algorithm, we compare its performance againts other popular algorithms, such as Self Organizing Maps (SOM) and K-Means, on Crime database. Results show that KNN-kernel based clustering has outperformed others.

[1]  David G. Stork,et al.  Pattern Classification , 1973 .

[2]  Dino Isa,et al.  Using the self organizing map for clustering of text documents , 2009, Expert Syst. Appl..

[3]  Guiyi Wei,et al.  Clustering Large Spatial Data with Local-density and its Application , 2009 .

[4]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[5]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[6]  Ito Wasito,et al.  Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma , 2007, Bioinformation.

[7]  John F. Roddick,et al.  Survey of Spatio-Temporal Databases , 1999, GeoInformatica.

[8]  Andrew R. Webb,et al.  Statistical Pattern Recognition , 1999 .

[9]  Zhai Liang,et al.  Research on dynamic visualization based on spatio-temporal database , 2006 .

[10]  Lutgarde M. C. Buydens,et al.  KNN-kernel density-based clustering for high-dimensional multivariate data , 2006, Comput. Stat. Data Anal..

[11]  Xuefeng Ya Research issues in spatio-temporal data mining , 2003 .

[12]  Angel R. Martinez,et al.  : Exploratory data analysis with MATLAB ® , 2007 .

[13]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[14]  Syed Sibte Raza Abidi,et al.  A data mining strategy for inductive data clustering: a synergy between self-organising neural networks and K-means clustering techniques , 2000, 2000 TENCON Proceedings. Intelligent Systems and Technologies for the New Millennium (Cat. No.00CH37119).