Symbolic Clustering with Interval-Valued Data

Abstract While many clustering techniques for interval-valued data have been proposed, there has been no proposal for a variable selection added fuzzy clustering method for high dimension low sample-size interval-valued data. This paper proposes such a novel fuzzy clustering method for interval-valued data with an adaptable variable selection. There are three reasons why the method is necessary: First, our target data in this study is high dimension low sample-size data. Due to the curse of dimensionality, we tend to obtain a poor classification result for this type of data. The main cause of this is noise occurring from irrelevant and redundant variables (dimensions). Therefore, we need to use an adaptable variable selection to reduce or summarize variables. Second, the merit of fuzzy clustering is to obtain the results with uncertain cluster boundaries, which is well adjusted with the uncertainty situation of classification to data. This gives a more robust result for the noise of data when compared with hard clustering while mathematically we can obtain a result with continuous values. Third, an adaptable representation of interval-valued data can be exploited to transform the original data into a more manageable data in order to avoid the curse of dimensionality. Numerical examples show a high performance for the proposed method.