Robust Whisper Activity Detection Using Long-Term Log Energy Variation of Sub-Band Signal

The goal in the whisper activity detection (WAD) is to find the whispered speech segments in a given noisy recording of whispered speech. Since whispering lacks the periodic glottal excitation, it resembles an unvoiced speech. This noise-like nature of the whispered speech makes WAD a more challenging task compared to a typical voice activity detection (VAD) problem. In this paper, we propose a feature based on the long term variation of the logarithm of the short-time sub-band signal energy for WAD. We also propose an automatic sub-band selection algorithm to maximally discriminate noisy whisper from noise. Experiments with eight noise types in four different signal-to-noise ratio (SNR) conditions show that, for most of the noises, the performance of the proposed WAD scheme is significantly better than that of the existing VAD schemes and whisper detection schemes when used for WAD.

[1]  Kazuya Takeda,et al.  Acoustic analysis and recognition of whispered speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[3]  John Mason,et al.  Robust voice activity detection using cepstral features , 1993, Proceedings of TENCON '93. IEEE Region 10 International Conference on Computers, Communications and Automation.

[4]  John H. L. Hansen,et al.  Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[5]  Chi Zhang,et al.  Whisper-Island Detection Based on Unsupervised Segmentation With Entropy-Based Speech Feature Processing , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  M.N.S. Swamy,et al.  An improved voice activity detection using higher order statistics , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  Petros Maragos,et al.  Speech event detection using multiband modulation energy , 2005, INTERSPEECH.

[8]  Hema A Murthy,et al.  Voice Activity Detection using Group Delay Processing on Buffered Short-term Energy , 2007 .

[9]  Javier Ramírez,et al.  Efficient voice activity detection algorithms using long-term speech information , 2004, Speech Commun..

[10]  S. Casale,et al.  Performance evaluation and comparison of G.729/AMR/fuzzy voice activity detectors , 2002, IEEE Signal Processing Letters.

[11]  V. Tartter What's in a whisper? , 1989, The Journal of the Acoustical Society of America.

[12]  John H. L. Hansen,et al.  Speaker Identification Within Whispered Speech Audio Streams , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Mark A. Clements,et al.  Reconstruction of speech from whispers , 2002, MAVEBA.

[14]  John H. L. Hansen,et al.  Advancements in whisper-island detection using the linear predictive residual , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Ian Vince McLoughlin,et al.  Reconstruction of Normal Sounding Speech for Laryngectomy Patients Through a Modified CELP Codec , 2010, IEEE Transactions on Biomedical Engineering.

[16]  Kazuya Takeda,et al.  Analysis and recognition of whispered speech , 2005, Speech Commun..

[17]  John H. L. Hansen,et al.  Speaker identification for whispered speech using modified temporal patterns and MFCCs , 2009, INTERSPEECH.

[18]  Hideki Kasuya,et al.  Acoustic nature of the whisper , 1999, EUROSPEECH.

[19]  Athanasios Katsamanis,et al.  Multi-band long-term signal variability features for robust voice activity detection , 2013, INTERSPEECH.

[20]  Francesco Beritelli,et al.  A robust voice activity detector for wireless communications using soft computing , 1998, IEEE J. Sel. Areas Commun..

[21]  Sven Nordholm,et al.  Statistical Voice Activity Detection Using Low-Variance Spectrum Estimation and an Adaptive Threshold , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Tanja Schultz,et al.  Whispery speech recognition using adapted articulatory features , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[23]  Zdravko Kacic,et al.  A multiconditional robust front-end feature extraction with a noise reduction procedure based on improved spectral subtraction algorithm , 2001, INTERSPEECH.

[24]  John H. L. Hansen,et al.  Speaker identification for whispered speech based on frequency warping and score competition , 2008, INTERSPEECH.

[25]  Andrzej Drygajlo,et al.  Entropy based voice activity detection in very noisy conditions , 2001, INTERSPEECH.

[26]  Tiago H. Falk,et al.  Whispered Speech Detection in Noise Using Auditory-Inspired Modulation Spectrum Features , 2013, IEEE Signal Processing Letters.

[27]  I. Boyd,et al.  The voice activity detector for the Pan-European digital cellular mobile telephone service , 1988, International Conference on Acoustics, Speech, and Signal Processing,.

[28]  Javier Ramírez,et al.  A new Kullback-Leibler VAD for speech recognition in noise , 2004, IEEE Signal Processing Letters.

[29]  Fred Cummins,et al.  Speaker Identification Using Instantaneous Frequencies , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[30]  Shrikanth S. Narayanan,et al.  Robust Voice Activity Detection Using Long-Term Signal Variability , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  S.M. Ahadi,et al.  Voice Activity Detection based on Combination of Multiple Features using Linear/Kernel Discriminant Analyses , 2008, 2008 3rd International Conference on Information and Communication Technologies: From Theory to Applications.