EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering

The Gaussian mixture model (GMM) provides a convenient yet principled framework for clustering, with properties suitable for statistical inference. In this paper, we propose a new model-based clustering algorithm, called EGMM (evidential GMM), in the theoretical framework of belief functions to better characterize cluster-membership uncertainty. With a mass function representing the cluster membership of each object, the evidential Gaussian mixture distribution composed of the components over the powerset of the desired clusters is proposed to model the entire dataset. The parameters in EGMM are estimated by a specially designed Expectation-Maximization (EM) algorithm. A validity index allowing automatic determination of the proper number of clusters is also provided. The proposed EGMM is as convenient as the classical GMM, but can generate a more informative evidential partition for the considered dataset. Experiments with synthetic and real datasets demonstrate the good performance of the proposed method as compared with some other prototype-based and model-based clustering techniques.

[1]  Georg Peters,et al.  Rough clustering utilizing the principle of indifference , 2014, Inf. Sci..

[2]  Gianni Costa,et al.  Document Clustering and Topic Modeling: A Unified Bayesian Probabilistic Perspective , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[3]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[4]  Volodymyr Melnykov,et al.  Finite mixture models and model-based clustering , 2010 .

[5]  Gerda Claeskens,et al.  Asymptotic post‐selection inference for the Akaike information criterion , 2018, Biometrika.

[6]  Thierry Denoeux,et al.  BPEC: Belief-Peaks Evidential Clustering , 2019, IEEE Transactions on Fuzzy Systems.

[7]  Rongquan Wang,et al.  A novel graph clustering method with a greedy heuristic search algorithm for mining protein complexes from dynamic and static PPI networks , 2020, Inf. Sci..

[8]  Philippe Smets,et al.  Decision making in the TBM: the necessity of the pignistic transformation , 2005, Int. J. Approx. Reason..

[9]  Pierpaolo D'Urso,et al.  Fuzzy clustering of mixed data , 2019, Inf. Sci..

[10]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[11]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[12]  Yansheng Li,et al.  Feature guided Gaussian mixture model with semi-supervised EM and local geometric constraint for retinal image registration , 2017, Inf. Sci..

[13]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[14]  Ran Tao,et al.  Low-Rank and Sparse Decomposition With Mixture of Gaussian for Hyperspectral Anomaly Detection , 2020, IEEE Transactions on Cybernetics.

[15]  P. Deb Finite Mixture Models , 2008 .

[16]  Thierry Denźux 40 years of Dempster-Shafer theory , 2016 .

[17]  Wei Liu,et al.  Deep Spectral Clustering Using Dual Autoencoder Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[19]  Huaxiang Zhang,et al.  Hierarchical prediction based on two-level Gaussian mixture model clustering for bike-sharing system , 2019, Knowl. Based Syst..

[20]  Quan Pan,et al.  Median evidential c-means algorithm and its application to community detection , 2015, Knowl. Based Syst..

[21]  Thierry Denoeux,et al.  EVCLUS: evidential clustering of proximity data , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[23]  Thierry Denoeux Calibrated model-based evidential clustering using bootstrapping , 2020, Inf. Sci..

[24]  Thierry Denoeux,et al.  CECM: Constrained evidential C-means algorithm , 2012, Comput. Stat. Data Anal..

[25]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[26]  Xin Ye,et al.  Multi-view ensemble learning based on distance-to-model and adaptive clustering for imbalanced credit risk assessment in P2P lending , 2020, Inf. Sci..

[27]  Thierry Denoeux,et al.  EK-NNclus: A clustering procedure based on the evidential K-nearest neighbor rule , 2015, Knowl. Based Syst..

[28]  Georg Peters,et al.  Is there any need for rough clustering? , 2015, Pattern Recognit. Lett..

[29]  Thierry Denoeux,et al.  ECM: An evidential version of the fuzzy c , 2008, Pattern Recognit..

[30]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[31]  Joydeep Ghosh,et al.  Data Clustering Algorithms And Applications , 2013 .

[32]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[33]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[34]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[35]  Yannis G. Yatracos MLE’s Bias Pathology, Model Updated MLE, and Wallace’s Minimum Message Length Method , 2015, IEEE Transactions on Information Theory.

[36]  Bo Yuan,et al.  A hybrid clustering and evolutionary approach for wireless underground sensor network lifetime maximization , 2019, Inf. Sci..

[37]  Thierry Denoeux,et al.  40 years of Dempster-Shafer theory , 2016, Int. J. Approx. Reason..

[38]  Sean Hughes,et al.  Clustering by Fast Search and Find of Density Peaks , 2016 .

[39]  James M. Keller,et al.  A possibilistic approach to clustering , 1993, IEEE Trans. Fuzzy Syst..

[40]  Thierry Denoeux,et al.  Evidential Clustering: A Review , 2016, IUKM.

[41]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[42]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[43]  Hyeyoung Park,et al.  Singularity and Slow Convergence of the EM algorithm for Gaussian Mixtures , 2009, Neural Processing Letters.

[44]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[45]  Alessio Ferone,et al.  Integrating rough set principles in the graded possibilistic clustering , 2019, Inf. Sci..