Estimating Gaussian Mixture Models from Data with Missing Features

Maximum likelihood (ML) fitting of Gaussian mixture model (GMMs) to feature data is most efficiently handled by the EM algorithm [1, 2, 3, 4]. The EM algorithm is directly applicable to multivariate data in which all the features are always present, and there are no missing values. Unfortunately, missing values are common: caused either by random or systematic effects. This study presents a novel algorithm for estimating the parameters of GMMs when there are random missing values. The approach is Bayesian in the missing values and ML in the GMM parameters. The same model can be applied to heteroscedastic data, and to indirectly observable mixed Gaussian observations.