Trend analysis of categorical data streams with a concept change method

This paper proposes a new method to trend analysis of categorical data streams. A data stream is partitioned into a sequence of time windows and the records in each window are assumed to carry a number of concepts represented as clusters. A data labeling algorithm is proposed to identify the concepts or clusters of a window from the concepts of the preceding window. The expression of a concept is presented and the distance between two concepts in two consecutive windows is defined to analyze the change of concepts in consecutive windows. Finally, a trend analysis algorithm is proposed to compute the trend of concept change in a data stream over the sequence of consecutive time windows. The methods for measuring the significance of an attribute that causes the concept change and the outlier degrees of objects are presented to reveal the causes of concept change. Experiments on real data sets are presented to demonstrate the benefits of the trend analysis method.

[1]  Gerhard Widmer,et al.  Learning in the presence of concept drift and hidden contexts , 2004, Machine Learning.

[2]  Ming-Syan Chen,et al.  Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Sns Rajalakshmi,et al.  A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites , 2012 .

[4]  Philip S. Yu,et al.  Under Consideration for Publication in Knowledge and Information Systems on Clustering Massive Text and Categorical Data Streams , 2022 .

[5]  Charu C. Aggarwal A segment-based framework for modeling and mining data streams , 2010, Knowledge and Information Systems.

[6]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[7]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[8]  Joshua Zhexue Huang,et al.  A Concept-Drifting Detection Algorithm for Categorical Evolving Data , 2013, PAKDD.

[9]  Zengyou He,et al.  Clustering categorical data streams , 2011, J. Comput. Methods Sci. Eng..

[10]  Yun Chi,et al.  Evolutionary spectral clustering by incorporating temporal smoothness , 2007, KDD '07.

[11]  Jiye Liang,et al.  A new initialization method for categorical data clustering , 2009, Expert Syst. Appl..

[12]  Keke Chen,et al.  HE-Tree: a framework for detecting changes in clustering structure for categorical data streams , 2009, The VLDB Journal.

[13]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Ali Hamzeh,et al.  A Precise Statistical approach for concept change detection in unlabeled data streams , 2011, Comput. Math. Appl..

[15]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[16]  Ee-Peng Lim,et al.  SCLOPE: An Algorithm for Clustering Data Streams of Categorical Attributes , 2004, DaWaK.

[17]  Philip S. Yu,et al.  Detection and Classification of Changes in Evolving Data Streams , 2006, Int. J. Inf. Technol. Decis. Mak..

[18]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[19]  Ming-Syan Chen,et al.  Adaptive Clustering for Multiple Evolving Streams , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[21]  Jiye Liang,et al.  A Framework for Clustering Categorical Time-Evolving Data , 2010, IEEE Transactions on Fuzzy Systems.

[22]  Deepayan Chakrabarti,et al.  Evolutionary clustering , 2006, KDD '06.

[23]  Jiye Liang,et al.  A simple and effective outlier detection algorithm for categorical data , 2014, Int. J. Mach. Learn. Cybern..