On Pre-processing Algorithms for Data Stream

Clustering is a one of the most important tasks of data mining. Algorithms like the Fuzzy C-Means and Possibilistic C-Means provide good result both for the static data and data streams. All clustering algorithms compute centers from chunk of data, what requires a lot of time. If the rate of incoming data is faster than speed of algorithm, part of data will be lost. To prevent such situation, some pre-processing algorithms should be used. The purpose of this paper is to propose a pre-processing method for clustering algorithms. Experimental results show that proposed method is appropriate to handle noisy data and can accelerate processing time.

[1]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[2]  Jacek M. Zurada,et al.  Artificial Intelligence and Soft Computing, 10th International Conference, ICAISC 2010, Zakopane, Poland, June 13-17, 2010, Part I , 2010, International Conference on Artificial Intelligence and Soft Computing.

[3]  Leszek Rutlowski Sequential pattern recognition procedures derived from multiple Fourier series , 1988 .

[4]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[5]  Renxia Wan,et al.  A Weighted Fuzzy Clustering Algorithm for Data Stream , 2008, 2008 ISECS International Colloquium on Computing, Communication, Control, and Management.

[6]  Won Suk Lee,et al.  Statistical grid-based clustering over data streams , 2004, SGMD.

[7]  Dimitrios Gunopulos,et al.  Temporal and spatio-temporal aggregations over data streams using multiple time granularities , 2003, Inf. Syst..

[8]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[9]  Janusz T. Starczewski,et al.  Connectionist Structures of Type 2 Fuzzy Inference Systems , 2001, PPAM.

[10]  Leszek Rutkowski,et al.  Computational intelligence - methods and techniques , 2008 .

[11]  L. Rutkowski Application of multiple Fourier series to identification of multivariable non-stationary systems , 1989 .

[12]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[13]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[14]  Geoff Hulten,et al.  A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering , 2001, ICML.

[15]  V. J. Rayward-Smith,et al.  Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition , 1999 .

[16]  Leszek Rutkowski,et al.  Neural Networks and Soft Computing , 2003 .

[17]  L. Rutkowski,et al.  A neuro-fuzzy controller with a compromise fuzzy reasoning , 2002 .

[18]  Li Tu,et al.  Stream data clustering based on grid density and attraction , 2009, TKDD.

[19]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[20]  Rafal Scherer Boosting Ensemble of Relational Neuro-fuzzy Systems , 2006, ICAISC.

[21]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[22]  Rafal Scherer,et al.  Neuro-fuzzy Systems with Relation Matrix , 2010, ICAISC.

[23]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[24]  Leszek Rutkowski,et al.  A general approach to neuro-fuzzy systems , 2001, 10th IEEE International Conference on Fuzzy Systems. (Cat. No.01CH37297).

[25]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[26]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[27]  Sadaaki Miyamoto,et al.  Algorithms for Fuzzy Clustering - Methods in c-Means Clustering with Applications , 2008, Studies in Fuzziness and Soft Computing.

[28]  R. Nowicki Nonlinear modelling and classification based on the MICOG defuzzification , 2009 .

[29]  R. Nedunchezhian,et al.  Minig rules of concept drift using genetic algorithm , 2011 .

[30]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[31]  L. Rutkowski Non-parametric learning algorithms in time-varying environments☆ , 1989 .

[32]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[33]  Ryszard Tadeusiewicz,et al.  Artificial Intelligence and Soft Computing - ICAISC 2006, 8th International Conference, Zakopane, Poland, June 25-29, 2006, Proceedings , 2006, International Conference on Artificial Intelligence and Soft Computing.

[34]  L. Rutkowski Real-time identification of time-varying systems by non-parametric algorithms based on Parzen kernels , 1985 .

[35]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[36]  Janusz T. Starczewski,et al.  Interval Type 2 Neuro-Fuzzy Systems Based on Interval Consequents , 2003 .