Enhancing the DISSFCM Algorithm for Data Stream Classification

Analyzing data streams has become a new challenge to meet the demands of real time analytics. Conventional mining techniques are proving inefficient to cope with challenges associated with data streams, including resources constraints like memory and running time along with single scan of the data. Most existing data stream classification methods require labeled samples that are more difficult and expensive to obtain than unlabeled ones. Semi-supervised learning algorithms can solve this problem by using unlabeled samples together with a few labeled ones to build classification models. Recently we proposed DISSFCM, an algorithm for data stream classification based on incremental semi-supervised fuzzy clustering. To cope with the evolution of data, DISSFCM adapts dynamically the number of clusters by splitting large-scale clusters. While splitting is effective in improving the quality of clusters, a repeated application without counter-balance may induce many small-scale clusters. To solve this problem, in this paper we enhance DISSFCM by introducing a procedure that merges small-scale clusters. Preliminary experimental results on a real-world benchmark dataset show the effectiveness of the method.

[1]  Corrado Mencar,et al.  Intelligent Twitter Data Analysis Based on Nonnegative Matrix Factorizations , 2017, ICCSA.

[2]  Pietro Ducange,et al.  A glimpse on big data analytics in the framework of marketing strategies , 2017, Soft Computing.

[3]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[4]  Francesco Masulli,et al.  Clustering of nonstationary data streams: A survey of fuzzy partitional methods , 2018, WIREs Data Mining Knowl. Discov..

[5]  Azuraliza Abu Bakar,et al.  Data stream clustering algorithms: A review , 2015, SOCO 2015.

[6]  Giovanna Castellano,et al.  Classification of Data Streams by Incremental Semi-supervised Fuzzy Clustering , 2016, WILF.

[7]  D. Toshniwal,et al.  Clustering techniques for streaming data-a survey , 2013, 2013 3rd IEEE International Advance Computing Conference (IACC).

[8]  Plamen Angelov,et al.  Fully online clustering of evolving data streams into arbitrarily shaped clusters , 2017, Inf. Sci..

[9]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[10]  Susheel Jain,et al.  A Fuzzy Clustering Algorithm for High Dimensional Streaming Data , 2013 .

[11]  Eyke Hüllermeier,et al.  Fuzzy Clustering of Parallel Data Streams , 2007 .

[12]  Herna L. Viktor,et al.  Dynamic adaptation of online ensembles for drifting data streams , 2017, Journal of Intelligent Information Systems.

[13]  Witold Pedrycz,et al.  Algorithms of fuzzy clustering with partial supervision , 1985, Pattern Recognit. Lett..

[14]  Mohamed Medhat Gaber,et al.  Knowledge discovery from data streams , 2009, IDA 2009.

[15]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[16]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[17]  Mustapha Lebbah,et al.  Micro-Batching Growing Neural Gas for Clustering Data Streams Using Spark Streaming , 2015, INNS Conference on Big Data.

[18]  Wee Keong Ng,et al.  A survey on data stream clustering and classification , 2015, Knowledge and Information Systems.

[19]  S. Mostafavi,et al.  Extending fuzzy c-means to clustering data streams , 2012, 20th Iranian Conference on Electrical Engineering (ICEE2012).

[20]  Edwin Lughofer A dynamic split-and-merge approach for evolving cluster models , 2012, Evol. Syst..

[21]  Thomas Villmann,et al.  Clustering by Fuzzy Neural Gas and Evaluation of Fuzzy Clusters , 2013, Comput. Intell. Neurosci..

[22]  Manoj B. Chandak Role of big-data in classification and novel class detection in data streams , 2016, Journal of Big Data.

[23]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[24]  Mahardhika Pratama,et al.  Online Active Learning in Data Stream Regression Using Uncertainty Sampling Based on Evolving Generalized Fuzzy Models , 2018, IEEE Transactions on Fuzzy Systems.

[25]  Michela Antonelli,et al.  A distributed approach to multi-objective evolutionary generation of fuzzy rule-based classifiers from big data , 2017, Inf. Sci..

[26]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[27]  Giovanna Castellano,et al.  Incremental adaptive semi-supervised fuzzy clustering for data stream classification , 2018, 2018 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS).

[28]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[29]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[30]  Hao Wang,et al.  Learning concept-drifting data streams with random ensemble decision trees , 2015, Neurocomputing.

[31]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[32]  Corrado Mencar,et al.  A framework for intelligent Twitter data analysis with non-negative matrix factorization , 2018, Int. J. Web Inf. Syst..