论文信息 - Symbolic Clustering with Interval-Valued Data

Symbolic Clustering with Interval-Valued Data

Abstract While many clustering techniques for interval-valued data have been proposed, there has been no proposal for a variable selection added fuzzy clustering method for high dimension low sample-size interval-valued data. This paper proposes such a novel fuzzy clustering method for interval-valued data with an adaptable variable selection. There are three reasons why the method is necessary: First, our target data in this study is high dimension low sample-size data. Due to the curse of dimensionality, we tend to obtain a poor classification result for this type of data. The main cause of this is noise occurring from irrelevant and redundant variables (dimensions). Therefore, we need to use an adaptable variable selection to reduce or summarize variables. Second, the merit of fuzzy clustering is to obtain the results with uncertain cluster boundaries, which is well adjusted with the uncertainty situation of classification to data. This gives a more robust result for the noise of data when compared with hard clustering while mathematically we can obtain a result with continuous values. Third, an adaptable representation of interval-valued data can be exploited to transform the original data into a more manageable data in order to avoid the curse of dimensionality. Numerical examples show a high performance for the proposed method.

Mika Sato-Ilic | M. Sato-Ilic

[1] J. Friedman. Clustering objects on subsets of attributes , 2002 .

[2] Mika Sato-Ilic,et al. An adaptive cluster-target covariance based principal component analysis for interval-valued data , 2010, International Conference on Fuzzy Systems.

[3] D. Hand,et al. Clustering objects on subsets of attributes , 2004 .

[4] Hans-Hermann Bock,et al. Analysis of Symbolic Data , 2000 .

[5] Edwin Diday,et al. Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[6] Sato-Ilic Mika. Fuzzy Variable Selection with Degree of Classification Based on Dissimilarity between Distributions of Variables , 2008 .

[7] Yoshiharu Sato,et al. EXTENDED FUZZY CLUSTERING MODELS FOR ASYMMETRIC SIMILARITY , 1995 .

[8] J. Welsh,et al. Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[9] Trevor Hastie,et al. The Elements of Statistical Learning , 2001 .

[10] J. Friedman,et al. Clustering objects on subsets of attributes (with discussion) , 2004 .

[11] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .