Extending fuzzy c-means to clustering data streams

A data stream is an ordered and continuous sequence of examples that can be examined only once. Data stream mining introduces new challenges compared to traditional mining algorithms. Fuzzy c-means (FCM) is a method of clustering in which a data point can assign to more than one cluster at the same time. In this paper we extend FCM algorithm to clustering data streams. Our performance experiments over KDD-CUP'99 data set show the efficiency of the algorithm.

[1]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  James C. Bezdek,et al.  Extending fuzzy and probabilistic clustering to very large data sets , 2006, Comput. Stat. Data Anal..

[3]  Lawrence O. Hall,et al.  Fast Accurate Fuzzy Clustering through Data Reduction , 2003 .

[4]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Alireza Rezaei Mahdiraji Clustering data stream: A survey of algorithms , 2009, Int. J. Knowl. Based Intell. Eng. Syst..

[6]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[7]  Eyke Hüllermeier,et al.  Online clustering of parallel data streams , 2006, Data Knowl. Eng..

[8]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Madjid Khalilian,et al.  Data Stream Clustering: Challenges and Issues , 2010, ArXiv.

[10]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[11]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[12]  Renxia Wan,et al.  A Weighted Fuzzy Clustering Algorithm for Data Stream , 2008, 2008 ISECS International Colloquium on Computing, Communication, Control, and Management.

[13]  Lawrence O. Hall,et al.  A fuzzy c means variant for clustering evolving data streams , 2007, 2007 IEEE International Conference on Systems, Man and Cybernetics.

[14]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[15]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.