Voiced/Nonvoiced Detection Based on Robustness of Voiced Epochs

In this paper, a new method for voiced/nonvoiced detection based on epoch extraction is proposed. Zero-frequency filtered speech signal is used to extract the instants of significant excitation (or epochs). The robustness of the method to extract epochs in the voiced regions, even with small amount of additive white noise, is used to distinguish voiced epochs from random instants detected in nonvoiced regions. The main feature of the proposed method is that it uses the strength of glottal activity as against using the periodicity of the signal. Performance of the proposed algorithm is studied on TIMIT and CMU ARCTIC databases, for two different noise types, white and vehicle noise from the NOISEX database, at different signal-to-noise ratios (SNRs). The proposed method performs similar or better than the popular normalized crosscorrelation based voiced/nonvoiced detection used in the open source utility wavesurfer, especially at lower SNRs.

[1]  Lawrence R. Rabiner,et al.  A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition , 1976 .

[2]  Dhany Arifianto,et al.  Dual Parameters for Voiced-Unvoiced Speech Signal Determination , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Sadegh Rezaei,et al.  Change Point Detection in GARCH Models for Voice Activity Detection , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Bayya Yegnanarayana,et al.  Characterization of Glottal Activity From Speech Signals , 2009, IEEE Signal Processing Letters.

[6]  Wei-Ping Zhu,et al.  A multifeature voiced/unvoiced decision algorithm for noisy speech , 2006, 2006 IEEE International Symposium on Circuits and Systems.

[7]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[8]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[9]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Philipos C. Loizou,et al.  Voiced/unvoiced speech discrimination in noise using Gabor atomic decomposition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.