Visualized Analysis of Mixed Numeric and Categorical Data Via Extended Self-Organizing Map

Many real-world datasets are of mixed types, having numeric and categorical attributes. Even though difficult, analyzing mixed-type datasets is important. In this paper, we propose an extended self-organizing map (SOM), called MixSOM, which utilizes a data structure distance hierarchy to facilitate the handling of numeric and categorical values in a direct, unified manner. Moreover, the extended model regularizes the prototype distance between neighboring neurons in proportion to their map distance so that structures of the clusters can be portrayed better on the map. Extensive experiments on several synthetic and real-world datasets are conducted to demonstrate the capability of the model and to compare MixSOM with several existing models including Kohonen's SOM, the generalized SOM and visualization-induced SOM. The results show that MixSOM is superior to the other models in reflecting the structure of the mixed-type data and facilitates further analysis of the data such as exploration at various levels of granularity.

[1]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[2]  E. Oja,et al.  Clustering Properties of Hierarchical Self-Organizing Maps , 1992 .

[3]  Juha Vesanto,et al.  SOM-based data visualization methods , 1999, Intell. Data Anal..

[4]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  Andreas Rauber,et al.  Visualising Class Distribution on Self-organising Maps , 2007, ICANN.

[7]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[8]  Jouko Lampinen,et al.  Clustering properties of hierarchical self-organizing maps , 1992, Journal of Mathematical Imaging and Vision.

[9]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[10]  Marie Cottrell,et al.  SOM-based algorithms for qualitative variables , 2004, Neural Networks.

[11]  Klaus Obermayer,et al.  Self-organizing maps and clustering methods for matrix data , 2004, Neural Networks.

[12]  Ezequiel López-Rubio Probabilistic Self-Organizing Maps for Continuous Data , 2010, IEEE Transactions on Neural Networks.

[13]  Andreas Rauber,et al.  Advanced Visualization Techniques for Self-organizing Maps with Graph-Based Methods , 2005, ISNN.

[14]  Risto Miikkulainen,et al.  Visualizing High-Dimensional Structure with the Incremental Grid Growing Neural Network , 1995, ICML.

[15]  Tommy W. S. Chow,et al.  PRSOM: a new visualization method by hybridizing multidimensional scaling and self-organizing map , 2005, IEEE Transactions on Neural Networks.

[16]  Thomas Voegtlin,et al.  Recursive self-organizing maps , 2002, Neural Networks.

[17]  Christos Faloutsos,et al.  Electricity Based External Similarity of Categorical Attributes , 2003, PAKDD.

[18]  Zhengxin Chen,et al.  A Multi-criteria Convex Quadratic Programming model for credit data analysis , 2008, Decis. Support Syst..

[19]  Alessio Micheli,et al.  Recursive self-organizing network models , 2004, Neural Networks.

[20]  Alessio Micheli,et al.  A general framework for unsupervised processing of structured data , 2004, Neurocomputing.

[21]  A self-organizing CMAC network with gray credit assignment , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Fionn Murtagh,et al.  Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering , 1995, Pattern Recognit. Lett..

[23]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[24]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[25]  Erkki Oja,et al.  Engineering applications of the self-organizing map , 1996, Proc. IEEE.

[26]  Chung-Chian Hsu,et al.  GViSOM for Multivariate Mixed Data Projection and Structure Visualization , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[27]  Barbara Hammer,et al.  Neural methods for non-standard data , 2004, ESANN.

[28]  D. Chen,et al.  Breast cancer diagnosis using self-organizing map for sonography. , 2000, Ultrasound in medicine & biology.

[29]  Kadim Tasdemir,et al.  Topology-Based Hierarchical Clustering of Self-Organizing Maps , 2011, IEEE Transactions on Neural Networks.

[30]  Melody Y. Kiang,et al.  Extending the Kohonen self-organizing map networks for clustering analysis , 2002 .

[31]  Panu Somervuo,et al.  Self-organizing maps of symbol strings , 1998, Neurocomputing.

[32]  Yao Wang,et al.  A robust and scalable clustering algorithm for mixed type attributes in large database environment , 2001, KDD '01.

[33]  Erzsébet Merényi,et al.  Exploiting Data Topology in Visualization and Clustering of Self-Organizing Maps , 2009, IEEE Transactions on Neural Networks.

[34]  Ah Chung Tsoi,et al.  A self-organizing map for adaptive processing of structured data , 2003, IEEE Trans. Neural Networks.

[35]  Habtom W. Ressom,et al.  Adaptive double self-organizing maps for clustering gene expression profiles , 2003, Neural Networks.

[36]  Heikki Mannila,et al.  Similarity of Attributes by External Probes , 1998, KDD.

[37]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[38]  Zengyou He,et al.  Scalable algorithms for clustering large datasets with mixed type attributes , 2005, Int. J. Intell. Syst..

[39]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[40]  Thomas Villmann,et al.  Supervised relevance neural gas and unified maximum separability analysis for classification of mass spectrometric data , 2004, 2004 International Conference on Machine Learning and Applications, 2004. Proceedings..

[41]  Hujun Yin,et al.  Data visualisation and manifold mapping using the ViSOM , 2002, Neural Networks.

[42]  Tommy W. S. Chow,et al.  Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density , 2004, Pattern Recognit..

[43]  Michaël Aupetit Visualizing the trustworthiness of a projection , 2006, ESANN.

[44]  Alfred Ultsch,et al.  U *-Matrix : a Tool to visualize Clusters in high dimensional Data , 2004 .

[45]  Esa Alhoniemi,et al.  Clustering of the self-organizing map , 2000, IEEE Trans. Neural Networks Learn. Syst..

[46]  Horst Bunke,et al.  Self-organizing map for clustering in the graph domain , 2002, Pattern Recognit. Lett..

[47]  Walter L. Smith Probability and Statistics , 1959, Nature.

[48]  Ning Chen,et al.  An Extension of Self-organizing Maps to Categorical Data , 2005, EPIA.

[49]  A. Ultsch Maps for the Visualization of high-dimensional Data Spaces , 2003 .

[50]  Hujun Yin,et al.  ViSOM - a novel method for multivariate data projection and structure visualization , 2002, IEEE Trans. Neural Networks.

[51]  Marie Cottrell,et al.  A Kohonen map representation to avoid misleading interpretations , 1996, The European Symposium on Artificial Neural Networks.

[52]  Chung-Chian Hsu,et al.  Generalizing self-organizing map for categorical data , 2006, IEEE Transactions on Neural Networks.

[53]  B. Gas Self-Organizing MultiLayer Perceptron , 2010, IEEE Transactions on Neural Networks.