Blind source separation of mixtures of speech signals has received considerable attention in the research community over the last 2 years. One computationally efficient method employs a gradient search algorithm to maximize the kurtosis of the outputs thereby achieving separation of the source signals. While this method has reported excellent separation results (30–50‐dB SIR), it assumes a simple linear mixing model. In the general case, convolutional mixing models are used, however, this is a rather difficult problem due to causality and stability restrictions on the inverse not to mention length requirements in the FIR approximation. Research results with the general problem are modest at best. In this paper, we extend the kurtosis maximization ideas for source separation to include delays in the mixing model to at least account for propagation delays from speakers to microphones. The algorithm is designed to first estimate the relative delays of the sources within each mixture using a standard autocorrelation technique. These delay estimates are then used in the kurtosis maximization algorithm where the separation matrix is now modified to include these delays. Simulation results (using the TIMIT speech corpus) generally indicate good separation quality (10–20 dB) with little additional computational overhead.
[1]
P. D. Leon,et al.
Speech separation by kurtosis maximization
,
1998,
Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[2]
Guy J. Brown,et al.
Separation of speech from interfering sounds based on oscillatory correlation
,
1999,
IEEE Trans. Neural Networks.
[3]
Yunsheng Ma,et al.
Normalized, HOS-based, blind speech separation algorithms
,
2000,
Conference Record of the Thirty-Fourth Asilomar Conference on Signals, Systems and Computers (Cat. No.00CH37154).
[4]
Yunxin Zhao,et al.
Adaptive co-channel speech separation and recognition
,
1999,
IEEE Trans. Speech Audio Process..
[5]
Lucas C. Parra,et al.
Convolutive blind separation of non-stationary sources
,
2000,
IEEE Trans. Speech Audio Process..