On the approximate W-disjoint orthogonality of speech

It is possible to blindly separate an arbitrary number of sources given just two anechoic mixtures provided the time-frequency representations of the sources do not overlap, a condition which we call W-disjoint orthogonality. We define a power weighted two-dimensional histogram constructed from the ratio of the time-frequency representations of the mixtures which is shown to have one peak for each source with: peak location corresponding to the relative amplitude and delay mixing parameters. All of the time-frequency points which yield estimates in a given peak are exactly all the non-zero magnitude components of one of the sources. We introduce the concept of approximate W-disjoint orthogonality, present experimental results demonstrating the level of approximate W-disjoint orthogonality of speech in mixtures of various order, and show that even with imperfect W-disjoint orthogonality the histogram can be used to determine the mixing parameters and separate sources. Example demixing results can be found online: http://www.princeton.edu/∼srickard/bss.html