Dimension Selective Self-Organizing Maps for clustering high dimensional data

High dimensional datasets usually present several dimensions which are irrelevant for certain clusters while they are relevant to other clusters. These irrelevant dimensions bring difficulties to the traditional clustering algorithms, because the high discrepancies within them can make objects appear too different to be grouped in the same cluster. Subspace clustering algorithms have been proposed to address this issue. However, the problem remains an open challenge for datasets with noise and outliers. This article presents an approach for subspace and projected clustering based on Self-Organizing Maps (SOM), that is called Dimensional Selective Self-Organizing Map. DSSOM keeps the properties of SOM and it is able to find clusters and identify their relevant dimensions, simultaneously, during the self-organizing process. The results presented by DSSOM were promising when compared with state of art subspace clustering algorithms.

[1]  René Vidal,et al.  Multiframe Motion Segmentation with Missing Data Using PowerFactorization and GPCA , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  Stephen Grossberg,et al.  Fuzzy ARTMAP neural network compared to linear discriminant analysis prediction of the length of hospital stay in patients with pneumonia , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[3]  Ira Assent,et al.  INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[4]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[5]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[6]  Hans-Peter Kriegel,et al.  Density-Connected Subspace Clustering for High-Dimensional Data , 2004, SDM.

[7]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[8]  Zijiang Yang,et al.  PARTCAT: A Subspace Clustering Algorithm for High Dimensional Categorical Data , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[9]  Hans-Peter Kriegel,et al.  A generic framework for efficient subspace clustering of high-dimensional data , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[11]  Aluizio F. R. Araújo,et al.  Local adaptive receptive field self-organizing map for image color segmentation , 2009, Image Vis. Comput..

[12]  Hans-Peter Kriegel,et al.  Subspace clustering , 2012, WIREs Data Mining Knowl. Discov..

[13]  Jorma Laaksonen,et al.  Variants of self-organizing maps , 1990, International 1989 Joint Conference on Neural Networks.

[14]  Man Lung Yiu,et al.  Frequent-pattern based iterative projected clustering , 2003, Third IEEE International Conference on Data Mining.

[15]  Allen Y. Yang,et al.  Unsupervised segmentation of natural images via lossy data compression , 2008, Comput. Vis. Image Underst..

[16]  S. Sastry,et al.  An algebraic geometric approach to the identification of a class of linear hybrid systems , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[17]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[18]  Philip S. Yu,et al.  Fast algorithms for projected clustering , 1999, SIGMOD '99.

[19]  Martin Ester,et al.  P3C: A Robust Projected Clustering Algorithm , 2006, Sixth International Conference on Data Mining (ICDM'06).

[20]  Stephen R. Marsland,et al.  A self-organising network that grows when required , 2002, Neural Networks.

[21]  Mohammed J. Zaki,et al.  SCHISM: a new approach for interesting subspace mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[22]  Stephen Grossberg,et al.  ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures , 1990, Neural Networks.

[23]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[24]  Jörg Sander,et al.  Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering , 2008, KDD.

[25]  Ira Assent,et al.  Evaluating Clustering in Subspace Projections of High Dimensional Data , 2009, Proc. VLDB Endow..

[26]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[27]  Thomas Villmann,et al.  Supervised Neural Gas with General Similarity Measure , 2005, Neural Processing Letters.

[28]  Jianhong Wu,et al.  Projective ART for clustering data sets in high dimensional spaces , 2002, Neural Networks.

[29]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .