Excitation Source Feature for Discriminating Shouted and Normal Speech

Dynamics of shouted speech production significantly vary from that of normal speech. These variations can be analyzed from excitation source information by using differenced electroglottogram (DEGG) signal. This work has two contributions. First, the proposal of a novel Glottal Open Phase Tilt (GOPT) feature derived from DEGG signal for discrimination of shouted and normal speech. Second, the construction of a database of speech and corresponding EGG signals for performance analysis of the proposed feature. In case of shouting, vocal folds vibrate faster and close abruptly. This leads to relative proximity of glottal opening and the following closing instances. This motivated the proposal of tilt feature for discriminating shouted from normal speech. The proposed feature is also extracted from ILPR signals that are known to approximate DEGG signals. Experiments on the collected dataset have provided shouted speech detection rate of 90.9% for DEGG and 76.37% for ILPR signals,

[1]  S. R. Mahadeva Prasanna,et al.  Classification of multi speaker shouted speech and single speaker normal speech , 2017, TENCON 2017 - 2017 IEEE Region 10 Conference.

[2]  V. K. Mittal,et al.  Effect of glottal dynamics in the production of shouted speech. , 2013, The Journal of the Acoustical Society of America.

[3]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[4]  Vinay Kumar Mittal,et al.  Changes in shout features in automatically detected vowel regions , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[5]  S. R. Mahadeva Prasanna,et al.  Shouted / normal speech classification using speech-specific features , 2016, 2016 IEEE Region 10 Conference (TENCON).

[6]  Paavo Alku,et al.  Detection of shouted speech in noise: human and machine. , 2013, The Journal of the Acoustical Society of America.

[7]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[8]  Milan Sigmund,et al.  Impact of vocal effort variability on automatic speech recognition , 2012, Speech Commun..

[9]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[10]  John H. L. Hansen,et al.  Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[11]  Bayya Yegnanarayana,et al.  An Automatic Shout Detection System Using Speech Production Features , 2014, MA3HMI@INTERSPEECH.

[12]  Vinay Kumar Mittal,et al.  Significance of automatic detection of vowel regions for automatic shout detection in continuous speech , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).

[13]  Rajib Sharma,et al.  Analysis of ElectroGlottoGraph signal using Ensemble Empirical Mode Decomposition , 2014, 2014 Annual IEEE India Conference (INDICON).

[14]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[15]  A. G. Ramakrishnan,et al.  Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index , 2013, IEEE Transactions on Audio, Speech, and Language Processing.