Pattern mining based on local distribution

Pattern mining gains more and more attention due to its useful applications in many areas, such as machine learning, database, multimedia, biology, and so on. Though there exist a lot of approaches for pattern mining, few of them consider the local distribution of the data. In the paper, we not only design six challenge datasets related to the local patterns, but also propose a new pattern mining algorithm based on local distribution. Unlike traditional pattern mining algorithms, our new algorithm first creates a local distribution for each data point by a random approach. Then, the distribution curve of each data point is simulated by the sum of low frequency curves obtained by the wavelet approach. In the third step, the coefficients of these low frequency curves for each data point are clustered by the normalized cut approach. Finally, the patterns of the datasets are obtained by the new pattern mining algorithm. The experiments show that our new algorithm outperforms traditional unsupervised learning approaches, such as K-means, EM, spectral clustering algorithm (SCA), and so on, on these six new datasets.

[1]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[2]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[3]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[4]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Siam J. CoMPtrr,et al.  FINDING A MAXIMUM CUT OF A PLANAR GRAPH IN POLYNOMIAL TIME * , 2022 .

[6]  Zhiwen Yu,et al.  FEMA: A Fast Expectation Maximization Algorithm based on Grid and PCA , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[7]  F. Hadlock,et al.  Finding a Maximum Cut of a Planar Graph in Polynomial Time , 1975, SIAM J. Comput..

[8]  David G. Stork,et al.  Pattern Classification , 1973 .

[9]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[10]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[11]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[12]  Jiong Yang,et al.  An Approach to Active Spatial Data Mining Based on Statistical Information , 2000, IEEE Trans. Knowl. Data Eng..

[13]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[14]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[15]  Václav Hlavác,et al.  Ten Lectures on Statistical and Structural Pattern Recognition , 2002, Computational Imaging and Vision.

[16]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[17]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[18]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[19]  Richard M. Leahy,et al.  An Optimal Graph Theoretic Approach to Data Clustering: Theory and Its Application to Image Segmentation , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.