Regularized Mixture Density Estimation With an Analytical Setting of Shrinkage Intensities

In this paper, we propose a method for P-variate probability density estimation assuming a Gaussian mixture model (GMM). Our method exploits a regularization technique for improving the estimation accuracy of the GMM component covariance matrices. We derive an expectation maximization algorithm for fitting our regularized GMM (RGMM), which exploits an analytical Ledoit-Wolf-type shrinkage estimation of the covariance matrices. Our method is compared with recent model-based and variational Bayes approximation methods using synthetic and real data sets. The obtained results show that the proposed RGMM method achieves a significant improvement in the performance of multivariate probability density estimation with respect to other methods on both the synthetic and the real data sets.

[1]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[2]  Xiaohui Chen,et al.  Shrinkage-to-Tapering Estimation of Large Covariance Matrices , 2012, IEEE Transactions on Signal Processing.

[3]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[4]  L. Wasserman,et al.  Practical Bayesian Density Estimation Using Mixtures of Normals , 1997 .

[5]  Aristidis Likas,et al.  Unsupervised Learning of Gaussian Mixtures Based on Variational Component Splitting , 2007, IEEE Transactions on Neural Networks.

[6]  Volker Tresp,et al.  Averaging, maximum penalized likelihood and Bayesian estimation for improving Gaussian mixture probability density estimates , 1998, IEEE Trans. Neural Networks.

[7]  G. McLachlan,et al.  The EM Algorithm and Extensions: Second Edition , 2008 .

[8]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[9]  J. Friedman,et al.  PROJECTION PURSUIT DENSITY ESTIMATION , 1984 .

[10]  C. Robert,et al.  Deviance information criteria for missing data models , 2006 .

[11]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[12]  R. Kass,et al.  Shrinkage Estimators for Covariance Matrices , 2001, Biometrics.

[13]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[14]  Alfred O. Hero,et al.  Robust Shrinkage Estimation of High-Dimensional Covariance Matrices , 2010, IEEE Transactions on Signal Processing.

[15]  Zoubin Ghahramani,et al.  Variational Inference for Bayesian Mixtures of Factor Analysers , 1999, NIPS.

[16]  Olivier Ledoit,et al.  Eigenvectors of some large sample covariance matrix ensembles , 2009, 0911.3010.

[17]  Jonathan J. Oliver,et al.  Finding overlapping components with MML , 2000, Stat. Comput..

[18]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[19]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[20]  Mayer Aladjem,et al.  Regularized mixture discriminant analysis , 2007, Pattern Recognit. Lett..

[21]  Francisco Escolano,et al.  Entropy-Based Incremental Variational Bayes Learning of Gaussian Mixtures , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Bradley P. Carlin,et al.  Bayesian measures of model complexity and fit , 2002 .

[23]  Adrian E. Raftery,et al.  Bayesian Regularization for Normal Mixture Estimation and Model-Based Clustering , 2007, J. Classif..

[24]  D. M. Titterington,et al.  Variational approximations in Bayesian model selection for finite mixture distributions , 2007, Comput. Stat. Data Anal..

[25]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[26]  David A. Landgrebe,et al.  A model-based mixture-supervised classification approach in hyperspectral data analysis , 2002, IEEE Trans. Geosci. Remote. Sens..

[27]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[28]  J. Friedman Regularized Discriminant Analysis , 1989 .

[29]  Ami Wiesel,et al.  Unified Framework to Regularized Covariance Estimation in Scaled Gaussian Models , 2012, IEEE Transactions on Signal Processing.

[30]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[31]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[32]  Xiaoqian Sun,et al.  Improved Stein-type shrinkage estimators for the high-dimensional multivariate normal covariance matrix , 2011, Comput. Stat. Data Anal..

[33]  Olivier Ledoit,et al.  Nonlinear Shrinkage Estimation of Large-Dimensional Covariance Matrices , 2011, 1207.5322.

[34]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[35]  Alfred O. Hero,et al.  Shrinkage Algorithms for MMSE Covariance Estimation , 2009, IEEE Transactions on Signal Processing.

[36]  S. Péché,et al.  Eigenvectors of some large sample covariance matrix ensembles , 2009 .

[37]  M. Aladjem Projection pursuit mixture density estimation , 2005, IEEE Transactions on Signal Processing.

[38]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[39]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[40]  Joseph N. Wilson,et al.  Twenty Years of Mixture of Experts , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[41]  M. C. Jones,et al.  Comparison of Smoothing Parameterizations in Bivariate Kernel Density Estimation , 1993 .

[42]  H. Akaike A new look at the statistical model identification , 1974 .

[43]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[44]  G. J. Mitchell,et al.  Principles and procedures of statistics: A biometrical approach , 1981 .

[45]  Michael Wolf,et al.  Nonlinear Shrinkage Estimation of Large-Dimensional Covariance Matrices , 2011 .

[46]  Nizar Bouguila,et al.  Variational Learning for Finite Dirichlet Mixture Models and Applications , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[48]  Gérard Govaert,et al.  Model-based cluster and discriminant analysis with the MIXMOD software , 2006, Comput. Stat. Data Anal..

[49]  Bor-Chen Kuo,et al.  A robust classification procedure based on mixture classifiers and nonparametric weighted feature extraction , 2002, IEEE Trans. Geosci. Remote. Sens..

[50]  Gérard Govaert,et al.  An improvement of the NEC criterion for assessing the number of clusters in a mixture model , 1999, Pattern Recognit. Lett..

[51]  P. Deb Finite Mixture Models , 2008 .

[52]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[53]  Pui Lam Leung,et al.  Estimation of Parameter Matrices and Eigenvalues in MANOVA and Canonical Correlation Analysis , 1987 .

[54]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..