A sparse fuzzy c-means algorithm based on sparse clustering framework

Abstract Fuzzy c-means (FCM) is a well-known clustering method that has wide applications in statistics, pattern recognition and data mining. However, its performance on large scale and high dimensional data is not satisfactory. In this paper, we propose sparse fuzzy C-means (SFCM) algorithm, which reforms traditional FCM to deal with high dimensional data clustering, based on Witten׳s sparse clustering framework. SFCM embeds feature selection into FCM via sparse weighting and makes model interpretation easier. The experiments and comparisons indicate the method is able to select important features and also increase the efficiency for large-scale clustering problem.

[1]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[2]  Thomas Villmann,et al.  Median fuzzy c-means for clustering dissimilarity data , 2010, Neurocomputing.

[3]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[4]  Yumi Kondo,et al.  Robustification of the sparse K-means clustering algorithm , 2011 .

[5]  R. Bellman Dynamic programming. , 1957, Science.

[6]  Robert Tibshirani,et al.  A Framework for Feature Selection in Clustering , 2010, Journal of the American Statistical Association.

[7]  Witold Pedrycz,et al.  Improving RBF networks performance in regression tasks by means of a supervised fuzzy clustering , 2006, Neurocomputing.

[8]  Robert Tibshirani,et al.  Hybrid hierarchical clustering with applications to microarray data. , 2005, Biostatistics.

[9]  Bao-Gang Hu,et al.  An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification , 2008, Neurocomputing.

[10]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  P. Sebastiani,et al.  Gene expression in histologically normal epithelium from breast cancer patients and from cancer-free prophylactic mastectomy patients shares a similar profile , 2010, British Journal of Cancer.

[13]  L. Afman,et al.  Postprandial dietary lipid-specific effects on human peripheral blood mononuclear cell gene expression profiles. , 2010, The American journal of clinical nutrition.

[14]  J. Friedman,et al.  Clustering objects on subsets of attributes (with discussion) , 2004 .