Semi-fuzzy Splitting in Online Divisive-Agglomerative Clustering

The Online Divisive-Agglomerative Clustering (ODAC) is an incremental approach for clustering streaming time series using a hierarchical procedure over time. It constructs a tree-like hierarchy of clusters of streams, using a top-down strategy based on the correlation between streams. The system also possesses an agglomerative phase to enhance a dynamic behavior capable of structural change detection. However, the split decision used in the algorithm focus on the crisp boundary between two groups, which implies a high risk since it has to decide based on only a small subset of the entire data. In this work we propose a semi-fuzzy approach to the assignment of variables to newly created clusters, for a better trade-off between validity and performance. Experimental work supports the benefits of our approach.

[1]  Meng Li,et al.  Stream Operators for Querying Data Streams , 2005, WAIM.

[2]  Min Wang,et al.  Efficient Evaluation of Composite Correlations for Streaming Time Series , 2003, WAIM.

[3]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[4]  Stan Matwin,et al.  Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases , 2007 .

[5]  João Gama,et al.  Stream-Based Electricity Load Forecast , 2007, PKDD.

[6]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  K. Pearson Mathematical Contributions to the Theory of Evolution. III. Regression, Heredity, and Panmixia , 1896 .

[8]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[9]  F. Gubina,et al.  An approach to customers daily load profile determination , 2002, IEEE Power Engineering Society Summer Meeting,.

[10]  M. Moy,et al.  Using hierarchical clustering methods to classify motor activities of COPD patients from wearable sensor data , 2005, Journal of NeuroEngineering and Rehabilitation.

[11]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[12]  Ming-Syan Chen,et al.  Adaptive Clustering for Multiple Evolving Streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[13]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[14]  João Gama,et al.  ODAC: Hierarchical Clustering of Time Series Data Streams , 2006, SDM.

[15]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[16]  R. Mantegna Hierarchical structure in financial markets , 1998, cond-mat/9802256.

[17]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[18]  Susana Nascimento Fuzzy Clustering Via Proportional Membership Model , 2005 .