Robust variational speech separation using fewer microphones than speakers

A variational inference algorithm for robust speech separation, capable of recovering the underlying speech sources even in the case of more sources than microphone observations, is presented. The algorithm is based upon a generative probabilistic model that fuses time-delay of arrival (TDOA) information with prior information about the speakers and application, to produce an optimal estimate of the underlying speech sources. Simulation results are presented for the case of two, three and four underlying sources and two microphone observations corrupted by noise. The resulting SNR gains (32 dB with two sources, 23 dB with three sources, and 16 dB with four sources) are significantly higher than previous speech separation techniques.