Speech — Nonspeech discrimination based on speech-relevant spectrogram modulations

In this work, we adopt an information theoretic approach - the Information Bottleneck method - to extract the relevant modulation frequencies across both dimensions of a spectrogram, for speech / non-speech discrimination (music, animal vocalizations, environmental noises). A compact representation is built for each sound ensemble, consisting of the maximally informative features. We demonstrate the effectiveness of a simple thresholding classifier which is based on the similarity of a sound to each characteristic modulation spectrum. When we assess the performance of the classification system at various SNR conditions using F-measure, results are equally good to a recently proposed method based on the same features but having significantly greater complexity.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Juan Carlos,et al.  Review of "Discrete-Time Speech Signal Processing - Principles and Practice", by Thomas Quatieri, Prentice-Hall, 2001 , 2003 .

[3]  Mounya Elhilali,et al.  A spectro-temporal modulation index (STMI) for assessment of speech intelligibility , 2003, Speech Commun..

[4]  Joos Vandewalle,et al.  A Multilinear Singular Value Decomposition , 2000, SIAM J. Matrix Anal. Appl..

[5]  S. Shamma,et al.  An account of monaural phase sensitivity. , 2002, The Journal of the Acoustical Society of America.

[6]  N. C. Singh,et al.  Modulation spectra of natural sounds and ethological theories of auditory processing. , 2003, The Journal of the Acoustical Society of America.

[7]  Nima Mesgarani,et al.  Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Anne Hsu,et al.  Tuning for spectro-temporal modulations as a mechanism for auditory discrimination of natural sounds , 2005, Nature Neuroscience.

[9]  Kuansan Wang,et al.  Spectral shape analysis in the central auditory system , 1995, IEEE Trans. Speech Audio Process..

[10]  David G. Stork,et al.  Pattern Classification , 1973 .

[11]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[12]  Naftali Tishby,et al.  Extraction of relevant speech features using the information bottleneck method , 2005, INTERSPEECH.

[13]  Noam Slonim,et al.  The Information Bottleneck : Theory and Applications , 2006 .

[14]  Gene H. Golub,et al.  Matrix computations , 1983 .