Significance of automatic detection of vowel regions for automatic shout detection in continuous speech

Automatic detection of shout prosody in continuous speech signal involves examining changes in its production characteristics. Our recent study of electroglottograph signals highlighted that significant changes occur in the glottal excitation source characteristics during production of shouted speech, especially in the vowel contexts. But the differences between normal and shouted speech, in the production features derived over utterances or word segments, may be masked sometimes by pauses or unvoiced regions related variations. Also, for such a real-time system, these vowel regions need to be found automatically. In this paper, changes in the shout production features are examined in the automatically detected vowel regions. Production of a vowel involves periodic impulse-like excitation and relatively high signal energy. Hence, the knowledge of epochs using zero-frequency filtering, and accurate vowel onset points can be used for detecting these regions. Changes in two excitation source features, the instantaneous fundamental frequency and strength of excitation, and in a vocal tract filter feature the dominant frequency, are examined for five steady vowel regions. Larger changes in these distinguishing features are observed in the automatically found vowel regions, than in word segments. This approach can help improving the systems for automatic detection of shout regions in continuous speech, and in paralinguistic applications that involve detection of prosody or emotions.

[1]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Haizhou Li,et al.  Scream detection for home applications , 2010, 2010 5th IEEE Conference on Industrial Electronics and Applications.

[3]  Anil Kumar Vuppala Application of Zero-Frequency Filtering for Vowel Onset Point Detection , 2014, MIKE.

[4]  Tjeerd C. Andringa,et al.  Verbal aggression detection in complex social environments , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[5]  Takanobu Nishiura,et al.  Acoustic-Based Security System: Towards Robust Understanding of Emergency Shout , 2009, 2009 Fifth International Conference on Information Assurance and Security.

[6]  John H. L. Hansen,et al.  Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[7]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Milan Sigmund,et al.  Impact of vocal effort variability on automatic speech recognition , 2012, Speech Commun..

[9]  S. R. Mahadeva Prasanna,et al.  Vowel Onset Point Detection Using Source, Spectral Peaks, and Modulation Spectrum Energies , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  V. K. Mittal,et al.  Effect of glottal dynamics in the production of shouted speech. , 2013, The Journal of the Acoustical Society of America.

[11]  Paavo Alku,et al.  Detection of shouted speech in noise: human and machine. , 2013, The Journal of the Acoustical Society of America.

[12]  Takanobu Nishiura,et al.  A fundamental study of novel speech interface for computer games , 2009, 2009 IEEE 13th International Symposium on Consumer Electronics.

[13]  Bayya Yegnanarayana,et al.  An Automatic Shout Detection System Using Speech Production Features , 2014, MA3HMI@INTERSPEECH.

[14]  Rabul Hussain Laskar,et al.  A pre-processing method for improvement of vowel onset point detection under noisy conditions , 2016, Speech Commun..

[15]  A. Vuppala,et al.  Improved vowel onset point detection using epoch intervals , 2012 .

[16]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[17]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[18]  Bayya Yegnanarayana,et al.  Production features for detection of shouted speech , 2013, 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC).

[19]  Paavo Alku,et al.  Shout detection in noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Jérôme Louradour,et al.  Audio Events Detection in Public Transport Vehicle , 2006, 2006 IEEE Intelligent Transportation Systems Conference.