A dynamic data granulation through adjustable fuzzy clustering

In this study, we develop a concept of dynamic data granulation realized in presence of incoming data organized in the form of so-called data snapshots. For each of these snapshots we reveal a structure by running fuzzy clustering. The proposed algorithm of adjustable fuzzy C-means (FCM) exhibits a number of useful features which directly associate with the dynamic nature of the underlying data: (a) the number of clusters is adjusted from one data snapshot to another in order to capture the varying structure of patterns and its complexity, (b) continuity between the consecutively discovered structures is retained, viz the clusters formed for a certain data snapshot are constructed as a result of evolving the clusters discovered in the predeceasing snapshot. We present a detailed clustering algorithm in which the mechanisms of adjustment of information granularity (the number of clusters) become the result of solutions to well-defined optimization tasks. The cluster splitting is guided by conditional fuzzy C-means (FCM) while cluster merging involves two neighboring prototypes. The criterion used to control the level of information granularity throughout the process is guided by a reconstruction criterion which quantifies an error resulting from pattern granulation and de-granulation. Numeric experiments provide a suitable illustration of the approach.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[3]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[4]  Dimitris K. Tasoulis,et al.  Unsupervised clustering on dynamic databases , 2005, Pattern Recognit. Lett..

[5]  Chung-Chian Hsu,et al.  Pattern recognition in time series database: A case study on financial database , 2007, Expert Syst. Appl..

[6]  Witold Pedrycz,et al.  Conditional fuzzy clustering in the design of radial basis function neural networks , 1998, IEEE Trans. Neural Networks.

[7]  Won Suk Lee,et al.  Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams , 2007, Data Knowl. Eng..

[8]  John Yen,et al.  Improving the interpretability of TSK fuzzy models by combining global learning and local learning , 1998, IEEE Trans. Fuzzy Syst..

[9]  Witold Pedrycz,et al.  Knowledge-based clustering - from data to information granules , 2007 .

[10]  Ujjwal Maulik,et al.  Clustering distributed data streams in peer-to-peer environments , 2006, Inf. Sci..

[11]  Witold Pedrycz,et al.  Fuzzy vector quantization with the particle swarm optimization: A study in fuzzy granulation-degranulation information processing , 2007, Signal Process..

[12]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[13]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[14]  Richard Weber,et al.  A methodology for dynamic data mining based on fuzzy clustering , 2005, Fuzzy Sets Syst..

[15]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[16]  Daniel Sánchez,et al.  A New Fuzzy Multidimensional Model , 2006, IEEE Transactions on Fuzzy Systems.

[17]  Witold Pedrycz,et al.  Fuzzy Systems Engineering , 2007 .

[18]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[19]  Lotfi A. Zadeh,et al.  Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic , 1997, Fuzzy Sets Syst..

[20]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[21]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[22]  Sushmita Mitra,et al.  Neuro-fuzzy rule generation: survey in soft computing framework , 2000, IEEE Trans. Neural Networks Learn. Syst..