The fourth-order cumulant of speech signals with application to voice activity detection

This paper explores the fourth order cumulants (FOC) of the LPC residual of speech signals and presents a new algorithm for Voice Activity detection (VAD) based on the newly established FOC properties. Analytical expressions for the horizontal slice of the 4th cumulant as well as the kurtosis of voiced speech are derived based on a reported sinusoidal model [4]. The derivations demonstrate that the kurtosis of voiced speech is distinct from that of Gaussian noise and can be used to aid in detecting voicing. The proposed VAD combines FOC metrics with SNR measures to classify speech and noise frames. Its performance is compared to the ITU-T G.729B VAD [1] in various noise conditions, and quantified using the probability of correct and false classifications. The results show the proposed VAD has overall comparable performance to the G.729B: Its probability of false classification is lower in low SNR and Gaussian-like noise, but higher in speech-like noises.

[1]  George Carayannis,et al.  Higher order statistics based Gaussianity test applied to on-line speech processing , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[2]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[3]  B. Wells,et al.  Voiced/Unvoiced decision based on the bispectrum , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  S. Mahmoud,et al.  The third-order cumulant of speech signals with application to reliable pitch estimation , 1998, Ninth IEEE Signal Processing Workshop on Statistical Signal and Array Processing (Cat. No.98TH8381).

[5]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  Alessandro Falaschi,et al.  Speech innovation characterisation by higher-order moments , 1993 .

[7]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..