Filling the gaps: Gaussian mixture models from noisy, truncated or incomplete samples

Astronomical data often suffer from noise and incompleteness. We extend the common mixtures-of-Gaussians density estimation approach to account for situations with a known sample incompleteness by simultaneous imputation from the current model. The method, called GMMis, generalizes existing Expectation-Maximization techniques for truncated data to arbitrary truncation geometries and probabilistic rejection processes, as long as they can be specified and do not depend on the density itself. The method accounts for independent multivariate normal measurement errors for each of the observed samples and recovers an estimate of the error-free distribution from which both observed and unobserved samples are drawn. It can perform a separation of a mixtures-of-Gaussian signal from a specified background distribution whose amplitude may be unknown. We compare GMMis to the standard Gaussian mixture model for simple test cases with different types of incompleteness, and apply it to observational data from the NASA Chandra X-ray telescope. The python code is released as an open-source package at this https URL