Data Evolvement Analysis Based on Topology Self-Adaptive Clustering algorithm

Along with the fast advance of internet technique, internet users have to deal with tremendous data every day. To our common sense, one of the most useful knowledge provided for users is about the transfer of the information reflected by two data sets collected at different time stages. This task aims at exploiting the knowledge such as what information newly appears, what information is antiquated, and what information maintains unchanged. It is formally entitled as data evolvement analysis . Clustering is a good solution to this issue. By analyzing the clustering results formed at different time stages, it is simple to acquire the transfer of the information. Unfortunately, aforementioned plan is impractical, since it needs to perform clustering algorithm once more, every time input data are updated. Obviously, it is time-consuming. Therefore, we need to devise a dynamic clustering algorithm. It automatically adjusts its structure to express this transfer. For this reason, a novel T opology S elf- A daptive C lustering algorithm (abbreviated as TSAC) is proposed in this paper. This algorithm comes from S elf O rganizing M apping algorithm (abbreviated as SOM), whereas, it doesn't need to make any assumption about neuron topology beforehand. Besides, when input data are updated, its topology remodeled meanwhile. For further elevating its performance, it imports minimum spanning tree to preserve its topology order, which is never performed by any traditional SOM based topology adaptive algorithm. For clearly measuring the range of the transfer, it partitions data space into several grids, and then calculates the density of each grid to quantify the transfer. Experiment results demonstrate that TSAC can automatically tune its topology along with the change of input data. By this algorithm and in addition to grid structure, the transfer of the information can be legibly visualized. DOI: http://dx.doi.org/10.5755/j01.itc.41.2.974

[1]  Michael Greenacre,et al.  Exploratory data analysis leading towards the most interesting simple association rules , 2008, Comput. Stat. Data Anal..

[2]  Jingtao Yao,et al.  A granular computing framework for self-organizing maps , 2009, Neurocomputing.

[3]  Inderjit S. Dhillon,et al.  Iterative clustering of high dimensional text data augmented by local search , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[4]  Andreas Rauber,et al.  The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data , 2002, IEEE Trans. Neural Networks.

[5]  Gisung Kim,et al.  Self-adaptive and dynamic clustering for online anomaly detection , 2011, Expert Syst. Appl..

[6]  P. N. Suganthan,et al.  Robust growing neural gas algorithm with application in cluster analysis , 2004, Neural Networks.

[7]  M. H. Ghaseminezhad,et al.  A novel self-organizing map (SOM) neural network for discrete groups of data clustering , 2011, Appl. Soft Comput..

[8]  Cheng-Lung Tseng,et al.  A self-growing probabilistic decision-based neural network with automatic data clustering , 2004, Neurocomputing.

[9]  Kevin Warwick,et al.  The plastic self organising map , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[10]  Melody Y. Kiang,et al.  Extending the Kohonen self-organizing map networks for clustering analysis , 2002 .

[11]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[12]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[13]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[14]  Victoria J. Hodge,et al.  Hierarchical growing cell structures: TreeGCS , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[15]  R. Aruga Multivariate classification of constrained data: problems and alternatives , 2004 .

[16]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[17]  Kwong-Sak Leung,et al.  Expanding Self-Organizing Map for data visualization and cluster analysis , 2004, Inf. Sci..

[18]  Alessio Micheli,et al.  Recursive self-organizing network models , 2004, Neural Networks.

[19]  Lian Duan,et al.  A Local Density Based Spatial Clustering Algorithm with Noise , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[20]  Dmitri A. Viattchenin VALIDITY MEASURES FOR HEURISTIC POSSIBILISTIC CLUSTERING , 2010 .

[21]  Tetsuo Furukawa,et al.  Modular network SOM , 2009, Neural Networks.

[22]  Cheng-Fa Tsai,et al.  ACODF: a novel data clustering approach for data mining in large databases , 2004, J. Syst. Softw..

[23]  Wei-Ying Ma,et al.  Multitype Features Coselection for Web Document Clustering , 2006, IEEE Trans. Knowl. Data Eng..

[24]  Zhihui Sun,et al.  Research on Clustering and Evolution Analysis of High Dimensional Data Stream , 2006, J. Comput. Res. Dev..

[25]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[26]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[27]  C. Ding,et al.  Spectral relaxation models and structure analysis for K-way graph clustering and bi-clustering , 2001 .

[28]  Ezequiel López-Rubio,et al.  Probabilistic self-organizing maps for qualitative data , 2010, Neural Networks.

[29]  Detlef D. Nauck,et al.  Towards the automation of intelligent data analysis , 2006, Appl. Soft Comput..

[30]  Stefan Wermter,et al.  A dynamic adaptive self-organising hybrid model for text clustering , 2003, Third IEEE International Conference on Data Mining.