A new robust blind microphone array method to enhance speech signals generated by multiple moving sources in a noisy environment is presented. This approach is based on a two-stage scheme. A subband clustering time-delay estimation algorithm is first used to localize the dominant speech sources. The speech enhancement is performed in a second stage, based on the acquired spatial information, by means of a soft-constrained subband beamformer. The robustness of this structure is ensured by the spatial constraint constructed to include the discrepancies in the acoustical environment model as well as errors in the time-delay estimation. Such scheme also allows for an efficient adaptation of the beamformer to speakers movement. The proposed subband clustering approach for time-delay estimation exploits the sparseness of speech signals in the time-frequency domain to localize multiple speakers simultaneously. It also provides means to select the number of target sources. Evaluation in a real environment with moving speakers shows promising results.
[1]
Ingvar Claesson,et al.
Moving Source Speech Enhancement Using Time-Delay Estimation
,
2005
.
[2]
Michael S. Brandstein,et al.
Microphone Arrays - Signal Processing Techniques and Applications
,
2001,
Microphone Arrays.
[3]
Robert Bregovic,et al.
Multirate Systems and Filter Banks
,
2002
.
[4]
Scott Rickard,et al.
Blind separation of speech mixtures via time-frequency masking
,
2004,
IEEE Transactions on Signal Processing.
[5]
Sven Nordholm,et al.
Spatio-temporal processing for distant speech recognition
,
2004,
2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[6]
Heekuck Oh,et al.
Neural Networks for Pattern Recognition
,
1993,
Adv. Comput..
[7]
Hiroshi Sawada,et al.
REAL-TIME BLIND EXTRACTION OF DOMINANT TARGET SOURCES FROM MANY BACKGROUND INTERFERENCE SOURCES
,
2005
.