Robust clustering of imprecise data

Abstract Robust fuzzy clustering models for fuzzy data are proposed. In particular, using a “Partitioning Around Medoids” (PAM) approach, first a timid robustification of fuzzy clustering for a general class of fuzzy data is proposed. Successively, we propose three robust fuzzy clustering models based on, respectively, the so-called metric, noise and trimmed approaches. The metric approach achieves its robustness with respect to outliers by taking into account a “robust” distance measure, the noise approach by introducing a noise cluster represented by a noise prototype, and the trimmed approach by trimming away a certain fraction of data units. A comparative simulation study and measures of misclassification and of robustness with respect to prototype detection in the presence of outliers have been developed. Several applications to chemometrical and benchmark data are also presented.

[1]  Pierpaolo D'Urso,et al.  Fuzzy Time Arrays and Dissimilarity Measures For Fuzzy Time Trajectories , 2000 .

[2]  P. Giordani,et al.  Component Models for Fuzzy Data , 2006 .

[3]  M. Wedel,et al.  Market Segmentation: Conceptual and Methodological Foundations , 1997 .

[4]  Miin-Shen Yang,et al.  A robust clustering procedure for fuzzy data , 2010, Computers and Mathematics with Applications.

[5]  Pierpaolo D'Urso,et al.  Fuzzy K-means clustering models for triangular fuzzy time trajectories , 2002 .

[6]  James M. Keller,et al.  Comparing Fuzzy, Probabilistic, and Possibilistic Partitions , 2010, IEEE Transactions on Fuzzy Systems.

[7]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the Rand index and other related indexes for clustering and classification assessment , 2007, Pattern Recognit. Lett..

[8]  Miin-Shen Yang,et al.  Fuzzy clustering on LR-type fuzzy numbers with an application in Taiwanese tea evaluation , 2005, Fuzzy Sets Syst..

[9]  Heungsun Hwang,et al.  Fuzzy Clusterwise Generalized Structured Component Analysis , 2007 .

[10]  M. Wedel,et al.  A fuzzy clusterwise regression approach to benefit segmentation , 1989 .

[11]  Hichem Frigui,et al.  A robust algorithm for automatic extraction of an unknown number of clusters from noisy data , 1996, Pattern Recognit. Lett..

[12]  Brian Everitt,et al.  Cluster analysis , 1974 .

[13]  Miin-Shen Yang,et al.  Alternative c-means clustering algorithms , 2002, Pattern Recognit..

[14]  Thierry Denoeux,et al.  ECM: An evidential version of the fuzzy c , 2008, Pattern Recognit..

[15]  Frank Klawonn,et al.  A Novel Approach to Noise Clustering for Outlier Detection , 2006, Soft Comput..

[16]  Stefan Van Aelst,et al.  The median of a random fuzzy number. The 1-norm distance approach , 2012, Fuzzy Sets Syst..

[17]  Paolo Giordani,et al.  Informational Paradigm and Entropy-Based Dynamic Clustering in a Complete Fuzzy Framework , 2004 .

[18]  Ricardo J. G. B. Campello,et al.  A fuzzy extension of the silhouette width criterion for cluster analysis , 2006, Fuzzy Sets Syst..

[19]  Lawrence O. Hall,et al.  Objective function‐based clustering , 2012, WIREs Data Mining Knowl. Discov..

[20]  Miin-Shen Yang,et al.  A similarity-based robust clustering method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Pierpaolo D'Urso,et al.  Fuzzy and possibilistic clustering for fuzzy data , 2012, Comput. Stat. Data Anal..

[22]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[23]  Witold Pedrycz,et al.  A parametric model for fusing heterogeneous fuzzy data , 1996, IEEE Trans. Fuzzy Syst..

[24]  Mohammad Hossein Fazel Zarandi,et al.  A Fuzzy Clustering Model for Fuzzy Data with Outliers , 2010, Int. J. Fuzzy Syst. Appl..

[25]  P. Groenen,et al.  Cluster differences scaling with a within-clusters loss component and a fuzzy successive approximation strategy to avoid local minima , 1997 .

[26]  M. Wedel,et al.  A Clusterwise Regression Method for Simultaneous Fuzzy Market Structuring and Benefit Segmentation , 1991 .

[27]  Anupam Joshi,et al.  Low-complexity fuzzy relational clustering algorithms for Web mining , 2001, IEEE Trans. Fuzzy Syst..

[28]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[29]  A. Salski,et al.  Fuzzy clustering of fuzzy ecological data , 2007, Ecol. Informatics.

[30]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[31]  James M. Keller,et al.  The possibilistic C-means algorithm: insights and recommendations , 1996, IEEE Trans. Fuzzy Syst..

[32]  Kuo-Lung Wu,et al.  Unsupervised possibilistic clustering , 2006, Pattern Recognit..

[33]  Luis Angel García-Escudero,et al.  Trimming Tools in Exploratory Data Analysis , 2003 .

[34]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[35]  Jian-Ping Mei,et al.  A Fuzzy Approach for Multitype Relational Data Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[36]  Pierpaolo D'Urso,et al.  Three-way fuzzy clustering models for LR fuzzy time trajectories , 2003, Comput. Stat. Data Anal..

[37]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[38]  Frank Klawonn,et al.  Fuzzy clustering: More than just fuzzification , 2015, Fuzzy Sets Syst..

[39]  T. Ganino,et al.  Olive oil traceability by means of chemical and sensory analyses: A comparison with SSR biomolecular profiles , 2011 .

[40]  Rudolf Kruse,et al.  Fuzzy Clustering with Repulsive Prototypes , 2009, Scalable Fuzzy Algorithms for Data Management and Analysis.

[41]  Mauro Barni,et al.  Comments on "A possibilistic approach to clustering" , 1996, IEEE Trans. Fuzzy Syst..

[42]  María Angeles Gil,et al.  A generalized L1-type metric between fuzzy numbers for an approach to central tendency of fuzzy data , 2013, Inf. Sci..

[43]  Rajesh N. Davé,et al.  Robust fuzzy clustering of relational data , 2002, IEEE Trans. Fuzzy Syst..

[44]  Ioannis S. Arvanitoyannis,et al.  Instrumental and sensory analysis of Greek wines; implementation of principal component analysis (PCA) for classification according to geographical origin , 2001 .

[45]  Alex B. McBratney,et al.  Application of fuzzy sets to climatic classification , 1985 .

[46]  P. Arabie,et al.  Overlapping Clustering: A New Method for Product Positioning , 1981 .

[47]  James C. Bezdek,et al.  Generalized fuzzy c-means clustering strategies using Lp norm distances , 2000, IEEE Trans. Fuzzy Syst..

[48]  C. Matrán,et al.  A central limit theorem for multivariate generalized trimmed $k$-means , 1999 .

[49]  Pierpaolo D'Urso,et al.  Fuzzy clusterwise linear regression analysis with symmetrical fuzzy output variable , 2006, Comput. Stat. Data Anal..

[50]  Dao-Qiang Zhang,et al.  A comment on "Alternative c-means clustering algorithms" , 2004, Pattern Recognit..

[51]  H. Hruschka Market definition and segmentation using fuzzy clustering methods , 1986 .

[52]  Pierpaolo D'Urso,et al.  Fuzzy Clustering for Data Time Arrays With Inlier and Outlier Time Trajectories , 2005, IEEE Transactions on Fuzzy Systems.

[53]  Pierpaolo D'Urso,et al.  A class of fuzzy clusterwise regression models , 2010, Inf. Sci..

[54]  A. Gordaliza,et al.  Robustness Properties of k Means and Trimmed k Means , 1999 .

[55]  Hans-Jürgen Zimmermann,et al.  Fuzzy Set Theory - and Its Applications , 1985 .

[56]  Witold Pedrycz,et al.  Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines , 2012, IEEE Transactions on Fuzzy Systems.

[57]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[58]  Luis Angel García-Escudero,et al.  A Proposal for Robust Curve Clustering , 2005, J. Classif..

[59]  M. Ichino General Metrics For Mixed Features The Cartesian Space Theory For Pattern Recognition , 1988, Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics.

[60]  Miin-Shen Yang,et al.  On a class of fuzzy c-numbers clustering procedures for fuzzy data , 1996, Fuzzy Sets Syst..

[61]  R. Kruse,et al.  An extension to possibilistic fuzzy cluster analysis , 2004, Fuzzy Sets Syst..

[62]  James M. Keller,et al.  Analysis and efficient implementation of a linguistic fuzzy c-means , 2002, IEEE Trans. Fuzzy Syst..

[63]  A. Keller Fuzzy clustering with outliers , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[64]  Miin-Shen Yang,et al.  Fuzzy clustering procedures for conical fuzzy vector data , 1999, Fuzzy Sets Syst..

[65]  Eghbal G. Mansoori,et al.  FRBC: A Fuzzy Rule-Based Clustering Algorithm , 2011, IEEE Transactions on Fuzzy Systems.

[66]  A. Salski,et al.  Fuzzy clustering of existing chemicals according to their ecotoxicological properties , 1996 .

[67]  Yung-Yu Chuang,et al.  Multiple Kernel Fuzzy Clustering , 2012, IEEE Transactions on Fuzzy Systems.

[68]  Shokri Z. Selim,et al.  Soft clustering of multidimensional data: a semi-fuzzy approach , 1984, Pattern Recognit..

[69]  Igor Skrjanc,et al.  Supervised Hierarchical Clustering in Fuzzy Model Identification , 2011, IEEE Transactions on Fuzzy Systems.

[70]  Witold Pedrycz,et al.  Two nonparametric models for fusing heterogeneous fuzzy data , 1998, IEEE Trans. Fuzzy Syst..

[71]  Jongwoo Kim,et al.  Application of the least trimmed squares technique to prototype-based clustering , 1996, Pattern Recognit. Lett..

[72]  Sadaaki Miyamoto,et al.  Fuzzy clustering of data with uncertainties using minimum and maximum distances based on L/sub 1/ metric , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[73]  M. Sato,et al.  Fuzzy clustering model for fuzzy data , 1995, Proceedings of 1995 IEEE International Conference on Fuzzy Systems..

[74]  Miin-Shen Yang,et al.  Fuzzy clustering algorithms for mixed feature variables , 2004, Fuzzy Sets Syst..

[75]  Xiao-Ying Wang,et al.  Novel Developments in Fuzzy Clustering for the Classification of Cancerous Cells Using FTIR Spectroscopy , 2007 .

[76]  Anne Laurent,et al.  Scalable Fuzzy Algorithms for Data Management and Analysis - Methods and Design , 2009, Scalable Fuzzy Algorithms for Data Management and Analysis.

[77]  Pierpaolo D'Urso,et al.  A possibilistic approach to latent component analysis for symmetric fuzzy data , 2005, Fuzzy Sets Syst..

[78]  Pierpaolo D'Urso,et al.  A weighted fuzzy c , 2006, Comput. Stat. Data Anal..

[79]  Bohdan S. Butkiewicz,et al.  Robust Fuzzy Clustering with Fuzzy Data , 2005, AWIC.