Research on an improved algorithm for cluster analysis

Cluster analysis is an important data mining technique used to find data segmentation and pattern information. By clustering the data, people can obtain the data distribution, observe the character of each cluster, and make further study on particular clusters. In addition, cluster analysis usually acts as the preprocessing of other data mining operations. Therefore, cluster analysis has become a very active research topic in data mining. By improving the algorithm of classical Q-mode factor model, we put forward a new clustering method for large-scaled database: Q-Mode Factor Clustering Method, which dramatically reduce the time complexity of the algorithm.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[3]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[4]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[7]  Chih-Ping Wei,et al.  Empirical comparison of fast partitioning-based clustering algorithms for large data sets , 2003, Expert Syst. Appl..

[8]  Philip S. Yu,et al.  Outlier detection for high dimensional data , 2001, SIGMOD '01.

[9]  Elizabeth Ann Maharaj,et al.  Cluster of Time Series , 2000, J. Classif..

[10]  Jung-Hua Wang,et al.  Two-stage clustering via neural networks , 2003, IEEE Trans. Neural Networks.

[11]  Ping Chen,et al.  Using Self-Similarity to Cluster Large Data Sets , 2003, Data Mining and Knowledge Discovery.

[12]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[13]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[14]  George Karypis,et al.  TR 99-007 A Hierarchical Clustering Algorithm Using Dynamic Modeling , 2004 .

[15]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[16]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[17]  Cor J. Veenman,et al.  A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..