Theoretical analysis of parametric blind spatial subtraction array and its application to speech recognition performance prediction

In this paper, an improved parametric postfiltering is introduced in our previously proposed blind spatial subtraction array (BSSA), and its theoretical analysis of the amounts of musical noise and noise reduction is conducted via higher-order statistics. Compared with the conventional BSSA, it is clarified that parametric BSSA can improve speech recognition performance. Next, we propose an unsupervised speech-recognition-performance prediction metric based on higher-order statistics in BSSA. We successfully reveal that the noise and speech kurtosis can be used for predicting speech recognition performance without using any reference speech signals.

[1]  Kiyohiro Shikano,et al.  Blind Spatial Subtraction Array for Speech Enhancement in Noisy Environment , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Kiyohiro Shikano,et al.  Theoretical Analysis of Musical Noise in Generalized Spectral Subtraction Based on Higher Order Statistics , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Joseph Sylvester Chang,et al.  A parametric formulation of the generalized spectral subtraction method , 1998, IEEE Trans. Speech Audio Process..

[4]  Kiyohiro Shikano,et al.  Theoretical analysis of musical noise in Wiener filtering family via higher-order statistics , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Hiroshi Sawada,et al.  Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[7]  Kiyohiro Shikano,et al.  Automatic optimization scheme of spectral subtraction based on musical noise assessment via higher-order statistics , 2008 .

[8]  P. Svaizer,et al.  Separating Short Signals in Highly Reverberant Environment by a Recursive Frequency-Domain BSS , 2008, 2008 Hands-Free Speech Communication and Microphone Arrays.

[9]  Walter Kellermann,et al.  A generalization of blind source separation algorithms for convolutive mixtures based on second-order statistics , 2005, IEEE Transactions on Speech and Audio Processing.

[10]  Kiyohiro Shikano,et al.  Musical-Noise Analysis in Methods of Integrating Microphone Array and Spectral Subtraction Based on Higher-Order Statistics , 2010, EURASIP J. Adv. Signal Process..

[11]  Kiyohiro Shikano,et al.  Julius - an open source real-time large vocabulary recognition engine , 2001, INTERSPEECH.