Speech Waveform Compression Using Robust Adaptive Voice Activity Detection for Nonstationary Noise in Multimedia Communications

The voice activity detection (VAD) is crucial in all kinds of speech applications. However, almost all existing VAD algorithms suffer from the nonstationarity of both speech and noise. To combat this difficulty, we propose a new voice activity detector, which is based on the Mel-energy features and an adaptive threshold related to the signal-to-noise ratio (SNR) estimates. In this paper, we first justify the robustness of the Bayes classifier using the Mel-energy features over that using the Fourier spectral features in various noise environments. Then, we design an algorithm using the dynamic Mel-energy estimator and the adaptive threshold which depends on the SNR estimates. In addition, a realignment scheme is incorporated to correct the sparse-and- spurious noise estimates. Numerous simulations are carried out to evaluate the performance of our proposed VAD method and the comparisons are made with a couple existing representative schemes, namely the VAD using the likelihood ratio test with Fourier spectral energy features and that based on the enhanced time-frequency parameters. Three types of noise, namely white noise (stationary), babble noise (nonstationary) and vehicular noise (nonstationary) were artificially added by the computer for our experiments. As a result, our proposed VAD algorithm significantly outperforms other existing methods as illustrated by the corresponding receiver operating curves (ROCs). Finally, we demonstrate one of the major applications, namely speech waveform compression, associated with our new robust VAD scheme and quantify the effectiveness in terms of compression efficiency.

[1]  Jhing-Fa Wang,et al.  A wavelet-based voice activity detection algorithm in noisy environments , 2002, 9th International Conference on Electronics, Circuits and Systems.

[2]  M. Hahn,et al.  An improved speech detection algorithm for isolated Korean utterances , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. Melamed Detection , 2021, SETI: Astronomy as a Contact Sport.

[4]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[5]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[6]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[7]  M. Gabrea,et al.  Correlation coefficient-based voice activity detector algorithm , 2004, Canadian Conference on Electrical and Computer Engineering 2004 (IEEE Cat. No.04CH37513).

[8]  Wonyong Sung,et al.  A voice activity detector employing soft decision based noise spectrum adaptation , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[9]  Robert W. Donaldson,et al.  Adaptive silence deletion for speech storage and voice mail applications , 1988, IEEE Trans. Acoust. Speech Signal Process..

[10]  Lawrence R. Rabiner,et al.  Voiced-unvoiced-silence detection using the Itakura LPC distance measure , 1977 .

[11]  Chin-Teng Lin,et al.  A robust word boundary detection algorithm for variable noise-level environment in cars , 2002, IEEE Trans. Intell. Transp. Syst..

[12]  P. Kabal,et al.  Comparison of voice activity detection algorithms for wireless personal communications systems , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[13]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[14]  Saeed Vaseghi,et al.  Noise compensation methods for hidden Markov model speech recognition in adverse environments , 1997, IEEE Trans. Speech Audio Process..

[15]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[16]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[17]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[18]  R. W. Donaldson,et al.  Real-time implementation and evaluation of an adaptive silence deletion algorithm for speech compression , 1991, [1991] IEEE Pacific Rim Conference on Communications, Computers and Signal Processing Conference Proceedings.