Clustering Large, Multi-level Data Sets: An Apporach Based on Kohonen Self Organizing Maps

Standard clustering methods do not handle truly large data sets and fail to take into account multi-level data structures. This work outlines an approach to clustering that integrates the Kohonen Self Organizing Map (SOM) with other clustering methods. Moreover, in order to take into account multi-level structures, a statistical model is proposed, in which a mixture of distributions may have mixing coefficients depending on higher-level variables. Thus, in a first step, the SOM provides a substantial data reduction, whereby a variety of ascending and divisive clustering algorithms become accessible. As a second step, statistical modelling provides both a direct means to treat multi-level structures and a framework for model-based clustering. The interplay of these two steps is illustrated on an example of nutritional data from a multicenter study on nutrition and cancer, known as EPIC

[1]  Marie Chavent,et al.  A monothetic clustering method , 1998, Pattern Recognit. Lett..

[2]  Hans-Hermann Bock,et al.  Classification and Clustering: Problems for the Future , 1994 .

[3]  Hans-Hermann Bock,et al.  Advances in data science and classification , 1998 .

[4]  Yves Lechevallier,et al.  Designing Neural Networks from Statistical Models: A New Approach to Data Exploration , 1995, KDD.

[5]  M. Schader,et al.  New Approaches in Classification and Data Analysis , 1994 .

[6]  Lynne Billard Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data, edited by H.-H. Bock and E. Diday , 2001, J. Classif..

[7]  Yves Lechevallier,et al.  Statistical models as building blocks of neural networks , 1997 .

[8]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[9]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[10]  A. D. Gordon,et al.  Classification : Methods for the Exploratory Analysis of Multivariate Data , 1981 .

[11]  Hans-Hermann Bock,et al.  Clustering and Neural Networks , 1998 .

[12]  Fouad Badran,et al.  Hierarchical clustering of self-organizing maps for cloud classification , 2000, Neurocomputing.

[13]  Fionn Murtagh,et al.  Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering , 1995, Pattern Recognit. Lett..

[14]  Georges Hébrail,et al.  Interactive Interpretation of Kohonen Maps Applied to Curves , 1998, KDD.