Speech-stream detection with low signal-to-noise ratios based on empirical mode decomposition and fourth-order statistics

Speech-stream detection plays an important role in short-wave communication. It is tiring for a person to listen something for a long time, especially in adverse environments. An algorithm for speech-stream detection in noisy environments, based on the empirical mode decomposition (EMD) and the statistical properties of higher-order cumulants of speech signals is presented. With the EMD, the noise signals can be decomposed into different numbers of IMFs. Then, the fourth-order cumulant (FOC) can be used to extract the desired feature of statistical properties for IMF components. Since the higher-order cumulants are blind for Gaussian signals, the proposed method is especially effective regarding the problem of speech-stream detection, where the speech signal is distorted, by Gaussian noise. Besides that, with the self-adaptive decomposition by the EMD, the proposed method can also work well for non-Gaussian noise. The experiments show that the proposed algorithm can suppress different noise types with different SNR, and the algorithm is robust in the real signal tests.

[1]  B.V. Harsha A noise robust speech activity detection algorithm , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[2]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[3]  Brian Mak,et al.  A robust speech/non-speech detection algorithm using time and frequency-based features , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Rafik A. Goubran,et al.  Robust voice activity detection using higher-order statistics in the LPC residual domain , 2001, IEEE Trans. Speech Audio Process..

[5]  Gabriel Rilling,et al.  Empirical mode decomposition as a filter bank , 2004, IEEE Signal Processing Letters.

[6]  Wei Wei,et al.  Speech stream detection based on higher-order statistics , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[7]  R. Tucker,et al.  Voice activity detection using a periodicity measure , 1992 .

[8]  Tang Kun Voice activity detection algorithm with low signal-to-noise ratios based on the spectrum entropy , 2005 .

[9]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[10]  Harry Wechsler,et al.  Detection of human speech in structured noise , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Rafik A. Goubran,et al.  Speech enhancement using fourth-order cumulants and optimum filters in the subband domain , 2002, Speech Commun..

[12]  S. Gökhun Tanyer,et al.  Voice activity detection in nonstationary noise , 2000, IEEE Trans. Speech Audio Process..