Automatic Detection of Shouted Speech Segments in Indian News Debates

Shouted speech detection is an essential pre-processing step in conventional speech processing systems such as speech and speaker recognition, speaker diarization, and others. Excitation source plays an important role in shouted speech production. This work explores feature computed from the Integrated Linear Prediction Residual (ILPR) signal for shouted speech detection in Indian news debates. The log spectrogram of ILPR signal provides time-frequency characteristics of excitation source signal. The proposed shouted speech detection system is deep network with CNN-based autoencoder and attention-based classifier sub-modules. The Autoencoder sub-network aids the classifier in learning discriminative deep embeddings for better classification. The proposed classifier is equipped with attention mechanism and Bidirectional Gated Recurrent Units. Classification results show that the proposed system with excitation feature performs better than baseline log spectrogram computed from the pre-emphasized speech signal. A score-level fusion of the classifiers trained on the source feature and the baseline feature provides the best performance. The performance of the proposed shouted speech detection is also evaluated at various speech segment durations.

[1]  B. Yegnanarayana,et al.  Perceived loudness of speech based on the characteristics of glottal excitation source. , 2009, The Journal of the Acoustical Society of America.

[2]  S. R. Mahadeva Prasanna,et al.  Classification of multi speaker shouted speech and single speaker normal speech , 2017, TENCON 2017 - 2017 IEEE Region 10 Conference.

[3]  Haizhou Li,et al.  Scream detection for home applications , 2010, 2010 5th IEEE Conference on Industrial Electronics and Applications.

[4]  Paavo Alku,et al.  Detection of shouted speech in noise: human and machine. , 2013, The Journal of the Acoustical Society of America.

[5]  Laurent Girin,et al.  Deep neural networks for automatic detection of screams and shouted speech in subway trains , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  S. R. Mahadeva Prasanna,et al.  Excitation Source Feature for Discriminating Shouted and Normal Speech , 2018, 2018 International Conference on Signal Processing and Communications (SPCOM).

[7]  Yu-Kai Lin,et al.  Classification of non-speech human sounds: Feature selection and snoring sound analysis , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[8]  John H. L. Hansen,et al.  Analysis and classification of speech mode: whispered through shouted , 2007, INTERSPEECH.

[9]  Vinay Kumar Mittal,et al.  Changes in shout features in automatically detected vowel regions , 2016, 2016 International Conference on Signal Processing and Communications (SPCOM).

[10]  S. R. Mahadeva Prasanna,et al.  Analysis of Excitation Source Characteristics for Shouted and Normal Speech Classification , 2020, 2020 National Conference on Communications (NCC).

[11]  Takanobu Nishiura,et al.  Acoustic-Based Security System: Towards Robust Understanding of Emergency Shout , 2009, 2009 Fifth International Conference on Information Assurance and Security.

[12]  Prithwijit Guha,et al.  Exploration of excitation source information for shouted and normal speech classification. , 2020, The Journal of the Acoustical Society of America.

[13]  V. K. Mittal,et al.  Effect of glottal dynamics in the production of shouted speech. , 2013, The Journal of the Acoustical Society of America.

[14]  Paavo Alku,et al.  Analysis and synthesis of shouted speech , 2013, INTERSPEECH.

[15]  S. Devi Making Sense of “Views” Culture in Television News Media in India , 2019, Journalism Practice.

[16]  A. G. Ramakrishnan,et al.  Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Milan Sigmund,et al.  Impact of vocal effort variability on automatic speech recognition , 2012, Speech Commun..

[18]  Paavo Alku,et al.  Shout detection in noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  S. R. Mahadeva Prasanna,et al.  Shouted and Normal Speech Classification Using 1D CNN , 2019, PReMI.

[20]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.