Visualising the Cluster Structure of Data Streams

The increasing availability of streaming data is a consequence of the continuing advancement of data acquisition technology. Such data provides new challenges to the various data analysis communities. Clustering has long been a fundamental procedure for acquiring knowledge from data, and new tools are emerging that allow the clustering of data streams. However the dynamic, temporal components of streaming data provide extra challenges to the development of stream clustering and associated visualisation techniques. In this work we combine a streaming clustering framework with an extension of a static cluster visualisation method, in order to construct a surface that graphically represents the clustering structure of the data stream. The proposed method, OpticsStream, provides intuitive representations of the clustering structure as well as the manner in which this structure changes through time.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[4]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[5]  Hans-Peter Kriegel,et al.  Incremental OPTICS: Efficient Computation of Updates in a Hierarchical Cluster Ordering , 2003, DaWaK.

[6]  R. Tryon Cluster Analysis , 1939 .

[7]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[8]  Daniel A. Keim,et al.  HD-Eye: Visual Mining of High-Dimensional Data , 1999, IEEE Computer Graphics and Applications.

[9]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[10]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[11]  David J. Hand,et al.  Intelligent Data Analysis: An Introduction , 2005 .

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  D. Hand,et al.  Finding Groups in Gene Expression Data , 2005, Journal of biomedicine & biotechnology.

[14]  Dimitris K. Tasoulis,et al.  Unsupervised Clustering In Streaming Data , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[15]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[16]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.