Fuzziness in data analysis: Towards accuracy and robustness

The first aim is to emphasize the use of fuzziness in data analysis to capture information that has been traditionally disregarded with a cost in the precision of the conclusions. Fuzziness can be considered in the data analysis process at various stages, but the main target in this paper will be fuzziness in the data. Depending on the nature of the fuzzy data or the aim to which they are handled, different approaches should be applied. We attempt to contribute to the clarification of such a difference while focusing on the so-called ontic approach in contrast to the epistemic approach. The second aim is to underline the need of considering robust methods to reduce the misleading impact of outliers in fuzzy data analysis. We propose trimming as a general and intuitive method to discard outliers. We exemplify this approach with the case of the ontic fuzzy trimmed mean/variance and highlight the differences with the epistemic case. All the discussions and developments are illustrated by means of a case-study concerning the perception of lengths of men and women.

[1]  Reinhard Viertl,et al.  Univariate statistical analysis with fuzzy data , 2006, Comput. Stat. Data Anal..

[2]  M. A. Lubiano,et al.  A new way of quantifying the symmetry of a random variable: Estimation and hypothesis testing , 2012 .

[3]  Wolfgang Graf,et al.  Recurrent neural networks for fuzzy data , 2011, Integr. Comput. Aided Eng..

[4]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[5]  Ana Colubi,et al.  Least squares estimation of linear regression models for convex compact random sets , 2007, Adv. Data Anal. Classif..

[6]  Wolfgang Näther Regression with fuzzy random data , 2006, Comput. Stat. Data Anal..

[7]  Ana Colubi,et al.  Nonparametric criteria for supervised classification of fuzzy data , 2011, Int. J. Approx. Reason..

[8]  R. Kruse,et al.  Statistics with vague data , 1987 .

[9]  M. Puri,et al.  Fuzzy Random Variables , 1986 .

[10]  Vladik Kreinovich,et al.  On-line algorithms for computing mean and variance of interval data, and their use in intelligent systems , 2007, Inf. Sci..

[11]  Ana Colubi,et al.  On the formalization of fuzzy random variables , 2001, Inf. Sci..

[12]  Huibert Kwakernaak,et al.  Fuzzy random variables--II. Algorithms and examples for the discrete case , 1979, Inf. Sci..

[13]  Didier Dubois,et al.  Gradualness, uncertainty and bipolarity: Making sense of fuzzy sets , 2012, Fuzzy Sets Syst..

[14]  Ana Colubi,et al.  Estimation of a simple linear regression model for fuzzy random variables , 2009, Fuzzy Sets Syst..

[15]  P. Rousseeuw,et al.  A fast algorithm for the minimum covariance determinant estimator , 1999 .

[16]  R. Coppi,et al.  Statistics with Fuzzy Random Variables , 2007 .

[17]  Ana Colubi,et al.  SMIRE Research Group at the University of Oviedo: A distance-based statistical analysis of fuzzy number-valued data , 2014, Int. J. Approx. Reason..

[18]  Jorge Casillas,et al.  Genetic learning of fuzzy rules based on low quality data , 2009, Fuzzy Sets Syst..

[19]  Didier Dubois,et al.  On various ways of tackling incomplete information in statistics , 2014, Int. J. Approx. Reason..

[20]  Phil Diamond,et al.  Fuzzy least squares , 1988, Inf. Sci..

[21]  Andrzej Bargiela,et al.  Multiple regression with fuzzy data , 2007, Fuzzy Sets Syst..

[22]  Didier Dubois,et al.  Statistical reasoning with set-valued information: Ontic vs. epistemic views , 2014, Int. J. Approx. Reason..

[23]  Richard Y. K. Fung,et al.  Estimating the functional relationships for quality function deployment under uncertainties , 2006, Fuzzy Sets Syst..

[24]  Volker Krätschmer,et al.  A unified approach to fuzzy random variables , 2001, Fuzzy Sets Syst..

[25]  Ana Colubi,et al.  Computational Statistics and Data Analysis Fuzzy Data Treated as Functional Data: a One-way Anova Test Approach , 2022 .

[26]  Ana Colubi,et al.  A set arithmetic-based linear regression model for modelling interval-valued responses through real-valued variables , 2013, Inf. Sci..

[27]  Pierpaolo D'Urso,et al.  A robust fuzzy k-means clustering model for interval valued data , 2006, Comput. Stat..

[28]  Maria Ferraro,et al.  A multiple linear regression model for imprecise information , 2012 .

[29]  Volker Krätschmer,et al.  Least-squares estimation in linear regression models with vague concepts , 2006, Fuzzy Sets Syst..

[30]  Witold Pedrycz,et al.  Kernel-based fuzzy clustering and fuzzy clustering: A comparative experimental study , 2010, Fuzzy Sets Syst..

[31]  Pierpaolo D'Urso,et al.  Robust fuzzy regression analysis , 2011, Inf. Sci..

[32]  H. Tanka Fuzzy data analysis by possibilistic linear models , 1987 .

[33]  Eyke Hüllermeier,et al.  Learning from imprecise and fuzzy observations: Data disambiguation through generalized loss minimization , 2013, Int. J. Approx. Reason..

[34]  Thierry Denœux Maximum likelihood estimation from fuzzy data using the EM algorithm , 2011 .

[35]  Marimuthu Palaniswami,et al.  Fuzzy c-Means Algorithms for Very Large Data , 2012, IEEE Transactions on Fuzzy Systems.

[36]  Sylvie Galichet,et al.  A revisited approach to linear fuzzy regression using trapezoidal fuzzy intervals , 2010, Inf. Sci..

[37]  Pierpaolo D'Urso,et al.  Multi-sample test-based clustering for fuzzy random variables , 2009, Int. J. Approx. Reason..

[38]  Bernard De Baets,et al.  A comparison of fuzzy regression methods for the estimation of the implied volatility smile function , 2015, Fuzzy Sets Syst..

[39]  Thierry Denoeux,et al.  Nonparametric regression analysis of uncertain and imprecise data using belief functions , 2004, Int. J. Approx. Reason..

[40]  Didier Dubois,et al.  On the Variability of the Concept of Variance for Fuzzy Random Variables , 2009, IEEE Transactions on Fuzzy Systems.

[41]  Pierpaolo D'Urso,et al.  Midpoint radius self-organizing maps for interval-valued data with telecommunications application , 2011, Appl. Soft Comput..

[42]  M. Puri,et al.  The Concept of Normality for Fuzzy Random Variables , 1985 .

[43]  Ana Colubi,et al.  A linear regression model for imprecise response , 2010, Int. J. Approx. Reason..

[44]  Ralf Körner,et al.  On the variance of fuzzy random variables , 1997, Fuzzy Sets Syst..

[45]  María Angeles Gil,et al.  The fuzzy approach to statistical analysis , 2006, Comput. Stat. Data Anal..

[46]  Ana Colubi,et al.  Testing linear independence in linear models with interval-valued data , 2007, Comput. Stat. Data Anal..

[47]  Juan Antonio Cuesta-Albertos,et al.  Impartial trimmed k-means for functional data , 2007, Comput. Stat. Data Anal..

[48]  Stefan Van Aelst,et al.  The median of a random fuzzy number. The 1-norm distance approach , 2012, Fuzzy Sets Syst..

[49]  Ana Colubi,et al.  Bootstrap techniques and fuzzy random variables: Synergy in hypothesis testing with fuzzy data , 2006, Fuzzy Sets Syst..

[50]  Pierpaolo D'Urso,et al.  A weighted fuzzy c , 2006, Comput. Stat. Data Anal..

[51]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[52]  Ana Colubi,et al.  A new family of metrics for compact, convex (fuzzy) sets based on a generalized concept of mid and spread , 2009, Inf. Sci..

[53]  Pierpaolo D'Urso,et al.  Fuzzy and possibilistic clustering for fuzzy data , 2012, Comput. Stat. Data Anal..

[54]  M. Gil,et al.  One-sample tests for a generalized Fréchet variance of a fuzzy random variable , 2010 .

[55]  Paolo Giordani,et al.  On possibilistic clustering with repulsion constraints for imprecise data , 2013, Inf. Sci..

[56]  Christian Döring,et al.  Data analysis with fuzzy clustering methods , 2006, Comput. Stat. Data Anal..