Robust unsupervised detection of human screams in noisy acoustic environments

This study is focused on an unsupervised approach for detection of human scream vocalizations from continuous recordings in noisy acoustic environments. The proposed detection solution is based on compound segmentation, which employs weighted mean distance, T2-statistics and Bayesian Information Criteria for detection of screams. This solution also employs an unsupervised threshold optimized Combo-SAD for removal of non-vocal noisy segments in the preliminary stage. A total of five noisy environments were simulated for noise levels ranging from -20dB to +20dB for five different noisy environments. Performance of proposed system was compared using two alternative acoustic front-end features (i) Mel-frequency cepstral coefficients (MFCC) and (ii) perceptual minimum variance distortionless response (PMVDR). Evaluation results show that the new scream detection solution works well for clean, +20, +10 dB SNR levels, with performance declining as SNR decreases to -20dB across a number of the noise sources considered.

[1]  John H. L. Hansen,et al.  Efficient audio stream segmentation via the combined T/sup 2/ statistic and Bayesian information criterion , 2005, IEEE Transactions on Speech and Audio Processing.

[2]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[3]  David H. Evans,et al.  Detection of cough signals in continuous audio recordings using hidden Markov models , 2006, IEEE Transactions on Biomedical Engineering.

[4]  Yu-Kai Lin,et al.  Classification of non-speech human sounds: Feature selection and snoring sound analysis , 2009, 2009 IEEE International Conference on Systems, Man and Cybernetics.

[5]  John H. L. Hansen,et al.  Analysis and identification of human scream: implications for speaker recognition , 2014, INTERSPEECH.

[6]  John H. L. Hansen,et al.  Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  John H. L. Hansen,et al.  Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems , 2003, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Nikos Fakotakis,et al.  On acoustic surveillance of hazardous situations , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Haizhou Li,et al.  Scream detection for home applications , 2010, 2010 5th IEEE Conference on Industrial Electronics and Applications.

[10]  Augusto Sarti,et al.  Scream and gunshot detection in noisy environments , 2007, 2007 15th European Signal Processing Conference.

[11]  Daniel P. W. Ellis,et al.  Laughter Detection in Meetings , 2004 .

[12]  John H. L. Hansen,et al.  A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition , 2008, Speech Commun..

[13]  Bayya Yegnanarayana,et al.  Production features for detection of shouted speech , 2013, 2013 IEEE 10th Consumer Communications and Networking Conference (CCNC).

[14]  Paavo Alku,et al.  Shout detection in noise , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  John H. L. Hansen,et al.  Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux , 2013, IEEE Signal Processing Letters.

[16]  John H. L. Hansen,et al.  Speech activity detection for NASA apollo space missions: challenges and solutions , 2014, INTERSPEECH.

[17]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .