Language Identification Using Spectrogram Texture

This paper proposes a novel front-end for automatic spoken language recognition, based on the spectrogram representation of the speech signal and in the properties of the Fourier spectrum to detect global periodicity in an image. Local Phase Quantization (LPQ) texture descriptor was used to capture the spectrogram content. Results obtained for 30 seconds test signal duration have shown that this method is very promising for low cost language identification. The best performance is achieved when our proposed method is fused with the i-vector representation.

[1]  Alessandro Lameiras Koerich,et al.  Music genre recognition based on visual features with dynamic ensemble of classifiers selection , 2013, 2013 20th International Conference on Systems, Signals and Image Processing (IWSSIP).

[2]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[3]  Ville Ojansivu,et al.  Blur Insensitive Texture Classification Using Local Phase Quantization , 2008, ICISP.

[4]  Björn W. Schuller,et al.  Paralinguistics in speech and language - State-of-the-art and the challenge , 2013, Comput. Speech Lang..

[5]  David A. van Leeuwen,et al.  Source-Normalized LDA for Robust Speaker Recognition Using i-Vectors From Multiple Speech Sources , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Esa Rahtu,et al.  Improved Blur Insensitivity for Decorrelated Local Phase Quantization , 2010, 2010 20th International Conference on Pattern Recognition.

[9]  Jean-Luc Rouas Automatic Prosodic Variations Modeling for Language and Dialect Discrimination , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  John H. L. Hansen,et al.  Language identification using a combined articulatory prosody framework , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[13]  Luiz Eduardo Soares de Oliveira,et al.  Music genre classification using LBP textural features , 2012, Signal Process..

[14]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Li-Rong Dai,et al.  Performance evaluation of deep bottleneck features for spoken language identification , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[16]  J. Wolfe SPEECH AND MUSIC, ACOUSTICS AND CODING, AND WHAT MUSIC MIGHT BE 'FOR' , 2002 .

[17]  Leena Mary Extraction and Representation of Prosody for Speaker, Speech and Language Recognition , 2012, Springer Briefs in Electrical and Computer Engineering.

[18]  Li-Rong Dai,et al.  Task-aware deep bottleneck features for spoken language identification , 2014, INTERSPEECH.

[19]  Luiz Eduardo Soares de Oliveira,et al.  Music Genre Recognition Using Gabor Filters and LPQ Texture Descriptors , 2013, CIARP.

[20]  Jean-Luc Rouas,et al.  Modeling long and short-term prosody for language identification , 2005, INTERSPEECH.