Density Based Self Organizing Incremental Neural Network for data stream clustering

Clustering is an important technique widely used in many areas such as machine learning, pattern recognition, data analysis etc. Data stream clustering is a branch of clustering that draws much attention in recent years, where data objects are processed as an ordered sequence. In this paper, we propose an unsupervised learning neural network named Density Based Self Organizing Incremental Neural Network(DenSOINN) for data stream clustering tasks. DenSOINN is a self organizing competitive network that grows incrementally to learn suitable nodes to fit the distribution of learning data, combining on-line unsupervised learning and topology learning by means of competitive Hebbian learning rule [19]. By adopting a density-based clustering mechanism, DenSOINN can discover arbitrarily shaped clusters and diminish the negative effect of noise. In addition, we adopt a self-adaptive distance framework to obtain good performance for learning unnormalized input data. Experiments show that the DenSOINN can achieve high standard performance equally on both raw data and normalized data.

[1]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[2]  Shen Furao,et al.  An incremental network for on-line unsupervised classification and topology learning , 2006, Neural Networks.

[3]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[4]  Christian Sohler,et al.  StreamKM++: A clustering algorithm for data streams , 2010, JEAL.

[5]  Xiong Xiao,et al.  A Load-Balancing Self-Organizing Incremental Neural Network , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Thomas Martinetz,et al.  Topology representing networks , 1994, Neural Networks.

[8]  Ana L. N. Fred,et al.  Robust data clustering , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[9]  Shen Furao,et al.  A fast nearest neighbor classifier based on self-organizing incremental neural network , 2008, Neural Networks.

[10]  Robert M. Haralick,et al.  Feature normalization and likelihood-based similarity measures for image retrieval , 2001, Pattern Recognit. Lett..

[11]  T. Martínez,et al.  Competitive Hebbian Learning Rule Forms Perfectly Topology Preserving Maps , 1993 .

[12]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[13]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[14]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[15]  Shen Furao,et al.  An enhanced self-organizing incremental neural network for online unsupervised learning , 2007, Neural Networks.

[16]  Davide Anguita,et al.  A Public Domain Dataset for Human Activity Recognition using Smartphones , 2013, ESANN.

[17]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[18]  Jonathan J. Hull A Database for Handwritten Text Recognition Research Some of the criticisms of experimental pattern recognition that are related to the replication of experiments and the comparison , 1994 .

[19]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[20]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[21]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[22]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[23]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[24]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[25]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.