Calibrated model-based evidential clustering using bootstrapping

Evidential clustering is an approach to clustering in which cluster-membership uncertainty is represented by a collection of Dempster-Shafer mass functions forming an evidential partition. In this paper, we propose to construct these mass functions by bootstrapping finite mixture models. In the first step, we compute bootstrap percentile confidence intervals for all pairwise probabilities (the probabilities for any two objects to belong to the same class). We then construct an evidential partition such that the pairwise belief and plausibility degrees approximate the bounds of the confidence intervals. This evidential partition is calibrated, in the sense that the pairwise belief-plausibility intervals contain the true probabilities "most of the time", i.e., with a probability close to the defined confidence level. This frequentist property is verified by simulation, and the practical applicability of the method is demonstrated using several real datasets.

[1]  Georg Peters,et al.  Rough clustering utilizing the principle of indifference , 2014, Inf. Sci..

[2]  Thierry Denoeux,et al.  k-CEVCLUS: Constrained evidential clustering of large dissimilarity data , 2017, Knowl. Based Syst..

[3]  Noureddine Zerhouni,et al.  Evidential evolving Gustafson-Kessel algorithm for online data streams partitioning using belief function theory , 2012, Int. J. Approx. Reason..

[4]  Norbert Henze,et al.  A class of invariant consistent tests for multivariate normality , 1990 .

[5]  Cajo J. F. ter Braak,et al.  Approximating a similarity matrix by a latent class model , 2013 .

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[8]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[9]  Stephen A. Vavasis,et al.  Complexity Theory: Quadratic Programming , 2009, Encyclopedia of Optimization.

[10]  Thierry Denoeux,et al.  Fusion of multi-tracer PET images for dose painting , 2014, Medical Image Anal..

[11]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[12]  James M. Keller,et al.  Fuzzy Models and Algorithms for Pattern Recognition and Image Processing , 1999 .

[13]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[14]  Kui Wang,et al.  Multivariate Skew t Mixture Models: Applications to Fluorescence-Activated Cell Sorting Data , 2009, 2009 Digital Image Computing: Techniques and Applications.

[15]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[16]  Miin-Shen Yang,et al.  Unsupervised fuzzy model-based Gaussian clustering , 2019, Inf. Sci..

[17]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .

[18]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[19]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[20]  Georg Peters,et al.  Is there any need for rough clustering? , 2015, Pattern Recognit. Lett..

[21]  Thierry Denoeux,et al.  ECM: An evidential version of the fuzzy c , 2008, Pattern Recognit..

[22]  Olivier Colot,et al.  Introducing spatial neighbourhood in Evidential C-Means for segmentation of multi-source images: Application to prostate multi-parametric MRI , 2014, Inf. Fusion.

[23]  J. Shao,et al.  The jackknife and bootstrap , 1996 .

[24]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[25]  Songsak Sriboonchitta,et al.  Evaluating and Comparing Soft Partitions: An Approach Based on Dempster–Shafer Theory , 2018, IEEE Transactions on Fuzzy Systems.

[26]  Thierry Denoeux,et al.  Decision-Making with Belief Functions: a Review , 2018, Int. J. Approx. Reason..

[27]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[28]  R. Brinkman,et al.  High-content flow cytometry and temporal data analysis for defining a cellular signature of graft-versus-host disease. , 2007, Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation.

[29]  Manuel Blum,et al.  Time Bounds for Selection , 1973, J. Comput. Syst. Sci..

[30]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[31]  Thierry Denoeux,et al.  EVCLUS: evidential clustering of proximity data , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Thierry Denoeux,et al.  Frequency-calibrated belief functions: Review and new insights , 2018, Int. J. Approx. Reason..

[33]  Pawan Lingras,et al.  Applying Rough Set Concepts to Clustering , 2012 .

[34]  Alessio Ferone,et al.  Integrating rough set principles in the graded possibilistic clustering , 2019, Inf. Sci..

[35]  Pierpaolo D'Urso,et al.  Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review , 2017, Inf. Sci..

[36]  Thierry Denoeux,et al.  Beyond Fuzzy, Possibilistic and Rough: An Investigation of Belief Functions in Clustering , 2016, SMPS.

[37]  Didier Dubois,et al.  Representations of Uncertainty in Artificial Intelligence: Probability and Possibility , 2020, A Guided Tour of Artificial Intelligence Research.

[38]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[39]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[40]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[41]  Thierry Denoeux,et al.  Constructing belief functions from sample data using multinomial confidence regions , 2006, Int. J. Approx. Reason..

[42]  Hua Li,et al.  Spatial Evidential Clustering With Adaptive Distance Metric for Tumor Segmentation in FDG-PET Images , 2018, IEEE Transactions on Biomedical Engineering.

[43]  Pierpaolo D'Urso,et al.  Fuzzy clustering of mixed data , 2019, Inf. Sci..

[44]  P. Deb Finite Mixture Models , 2008 .

[45]  Thomas Brendan Murphy,et al.  Investigation of parameter uncertainty in clustering using a Gaussian mixture model via jackknife, bootstrap and weighted likelihood bootstrap , 2015, Computational Statistics.

[46]  Thierry Denoeux,et al.  Evidential clustering of large dissimilarity data , 2016, Knowl. Based Syst..

[47]  Quan Pan,et al.  Median evidential c-means algorithm and its application to community detection , 2015, Knowl. Based Syst..

[48]  Richard Weber,et al.  Soft clustering - Fuzzy and rough approaches and their extensions and derivatives , 2013, Int. J. Approx. Reason..

[49]  G. Reaven,et al.  An attempt to define the nature of chemical diabetes using a multidimensional analysis , 2004, Diabetologia.

[50]  Thierry Denoeux,et al.  RECM: Relational evidential c-means algorithm , 2009, Pattern Recognit. Lett..