A density-based clustering structure mining algorithm for data streams

Today, advances in hardware and storage techniques demand for automatically data mining on data streams. Clustering analysis is an importance tool on data streams mining. Though density-based clustering algorithms on data streams now could discover clusters of arbitrary shapes, their effectiveness are depended on parameters settings. Also global parameters used in these algorithms limit their ability in discovering overlapping clusters. In this paper, we propose a novel density-based clustering structure mining algorithm for data streams---OPCluStream. It could adaptively discover clusters of arbitrary shapes and overlapping clusters. Satisfying one-pass constraint, OPCluStream uses a tree topology to index points on which points link to other related ones using pointers directionally. This tree topology records relationships among points, which represent clustering results including a broad range of Eps settings and could discover clusters through a transformation to clustering structure. Clustering structure is equivalent to the index structure and convenient to be used. In addition, OPCluStream has a high efficiency on clustering since a usage of tree topology in points' index and a designed limited computing area when new points added to data streams. A number of experiments on synthetic and real data sets illustrate the effectiveness, efficiency and insights provided by our method.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[3]  Ira Assent,et al.  Self-Adaptive Anytime Stream Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[4]  Yin Jian,et al.  Arbitrary Shape Cluster Algorithm for Clustering Data Stream , 2006 .

[5]  G. Clark,et al.  Reference , 2008 .

[6]  Myra Spiliopoulou,et al.  C-DenStream: Using Domain Knowledge on a Data Stream , 2009, Discovery Science.

[7]  Li Tu,et al.  Stream data clustering based on grid density and attraction , 2009, TKDD.

[8]  Xia Lu SA-DBSCAN:A self-adaptive density-based clustering algorithm , 2009 .

[9]  Yin Guo-fu Adapted DBSCAN with multi-threshold , 2008 .

[10]  Jiadong Ren,et al.  Density-Based Data Streams Clustering over Sliding Windows , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[11]  Yanwei Yu,et al.  An On-line Density-based Clustering Algorithm for Spatial Data Stream , 2012 .

[12]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[13]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[14]  Hans-Peter Kriegel,et al.  Incremental Clustering for Mining in a Data Warehousing Environment , 1998, VLDB.

[15]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[16]  Alfred O. Hero,et al.  Adaptive evolutionary clustering , 2011, Data Mining and Knowledge Discovery.

[17]  Ying Tan Adapted DBSCAN with multi-threshold: Adapted DBSCAN with multi-threshold , 2008 .

[18]  Daniel A. Keim,et al.  An Efficient Approach to Clustering in Large Multimedia Databases with Noise , 1998, KDD.

[19]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[20]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[21]  Cai Yingkun An Improved DBSCAN Algorithm which is Insensitive to Input Parameters , 2004 .

[22]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[23]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[24]  Xing Xie,et al.  GeoLife2.0: A Location-Based Social Networking Service , 2009, 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware.

[25]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[26]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .