Significance of the MUSIC-group delay spectrum in speech acquisition from distant microphones

Conventionally the spectral magnitude of MUSIC is used for efficient beam forming and clean speech acquisition from distant microphones. The MUSIC method is unable to resolve closely spaced DOAs with a computationally plausible number of sensors. In this paper we propose the use of the group delay function computed from theMUSIC phase spectrum for efficient DOA estimation. The group delay function which has been hitherto used for temporal frequency processing of speech signals is computed on the phase spectrum of MUSIC and is found to resolve spatially contiguous speech sources. The additive property of the group delay function in the spatial domain is also discussed using root-MUSIC polynomial analysis. Experimental results on DOA estimation using a two channel microphone array show that the average error distribution of the MUSIC group delay spectrum is minimum when compared to MUSIC magnitude spectrum. Filter-Sum beam formers are trained using estimated DOAs on speech acquired from distant microphones. The results of speech recognition experiments conducted on meeting room data are used to illustrate the significance of the MUSIC group delay spectrum in speech acquisition from distant microphones.

[1]  Bayya Yegnanarayana,et al.  Significance of group delay functions in spectrum estimation , 1992, IEEE Trans. Signal Process..

[2]  Mohan M. Trivedi,et al.  Role of head pose estimation in speech acquisition from distant microphones , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[4]  Arun Ross,et al.  Microphone Arrays , 2009, Encyclopedia of Biometrics.

[5]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[6]  Harry L. Van Trees,et al.  Optimum Array Processing , 2002 .

[7]  Bhaskar D. Rao,et al.  Robust Broadband Beamformer with Diagonally Loaded Constraint Matrix and Its Application to Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .