Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios

Beamforming is crucial for hands-free mobile terminals and voice-enabled automated home environments based on distant-speech interaction to mitigate causes of system degradation, e.g., interfering noise sources or competing speakers. This paper presents an adaptation of the most common state-of-the-art broadband beamformers to uniform circular arrays, such that competing speakers are attenuated sufficiently for distant speech recognition. As a result, a new beamformer is presented. Finally, the speech quality of the beamformers’ enhanced signals is evaluated with different objective speech quality measures and a word recognizer as a measure for the attenuation of competing speakers.

[1]  Michael Inggs,et al.  Null placement in a circular antenna array for Passive Coherent Location systems , 2010, 2010 IEEE Radar Conference.

[2]  M. Bar Visual objects in context , 2004, Nature Reviews Neuroscience.

[3]  R. Sukanesh,et al.  Robust Adaptive Beamformers Using Diagonal Loading , 2011 .

[4]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[5]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Mati Wax,et al.  Direction finding of coherent signals via spatial smoothing for uniform circular arrays , 1994 .

[7]  Adrian E. Conway,et al.  Output-based method of applying PESQ to measure the perceptual quality of framed speech signals , 2004, 2004 IEEE Wireless Communications and Networking Conference (IEEE Cat. No.04TH8733).

[8]  John H. L. Hansen,et al.  An effective quality evaluation protocol for speech enhancement algorithms , 1998, ICSLP.

[9]  J. Shynk Frequency-domain and multirate adaptive filtering , 1992, IEEE Signal Processing Magazine.

[10]  Walter Kellermann,et al.  Design of robust superdirective beamformers as a convex optimization problem , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  G. Kubin,et al.  Frame Change Ratio: A Measure to Model Short-Time Stationarity of Speech , 2006, 2006 Innovations in Information Technology.

[12]  L. Rabiner,et al.  An interpretation of the log likelihood ratio as a measure of waveform coder performance , 1980 .

[13]  Keith Vertanen Baseline Wsj Acoustic Models for Htk and Sphinx : Training Recipes and Recognition Experiments , 2007 .

[14]  H. Saarnisaari,et al.  Robust adaptive beamforming in software defined radio with adaptive diagonal loading , 2005, MILCOM 2005 - 2005 IEEE Military Communications Conference.

[15]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  John H. L. Hansen,et al.  CSA-BF: a constrained switched adaptive beamformer for speech enhancement and recognition in real car environments , 2003, IEEE Trans. Speech Audio Process..

[17]  R. Balan,et al.  PERFORMANCE ASSESSMENT METHOD FOR SPEECH ENHANCEMENT SYSTEMS , 2005 .

[18]  G. Wei Discrete singular convolution for beam analysis , 2001 .

[19]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[20]  Gary W. Elko A new technique to measure electroacoustic transducer directivity indices in reverberant fields , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[21]  J. Fung Effects of Steering Delay Quantization in Beamforming , 2003 .

[22]  Jian Li,et al.  On robust Capon beamforming and diagonal loading , 2003, IEEE Trans. Signal Process..

[23]  Nobuhiko Kitawaki,et al.  Objective quality evaluation for low-bit-rate speech coding systems , 1988, IEEE J. Sel. Areas Commun..

[24]  Jing Gu,et al.  Robust adaptive beamforming using variable loading , 2006, Fourth IEEE Workshop on Sensor Array and Multichannel Processing, 2006..

[25]  Qun Wan,et al.  Digital Ultra-Wideband Beamformer Based on Minimum Variance Multi-Frequency Distortionless Restriction , 2010, ArXiv.

[26]  Volker Hohmann,et al.  Objective perceptual quality assessment for self-steering binaural hearing aid microphone arrays , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  Ivan Tashev,et al.  Sound Capture and Processing: Practical Approaches , 2009 .

[28]  Henrique S. Malvar,et al.  Speech dereverberation via maximum-kurtosis subband adaptive filtering , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[29]  Tania Habib,et al.  Combining multiband joint position-pitch algorithm and particle filters for speaker localization , 2010, 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop.

[30]  Ning Ma,et al.  The CHiME corpus: a resource and a challenge for computational hearing in multisource environments , 2010, INTERSPEECH.

[31]  K. J. Ray Liu,et al.  Handbook on Array Processing and Sensor Networks , 2010 .

[32]  Dennis H. Klatt,et al.  Prediction of perceived phonetic distance from critical-band spectra: A first step , 1982, ICASSP.