Multilingual i-Vector Based Statistical Modeling for Music Genre Classification

For music signal processing, compared with the strategy which models each short-time frame independently, when the long-time features are considered, the time-series characteristics of the music signal can be better presented. As a typical kind of long-time modeling strategy, the identification vector (i-vector) uses statistical modeling to model the audio signal in the segment level. It can better capture the important elements of the music signal, and these important elements may benefit to the classification of music signal. In this paper, the ivector based statistical feature for music genre classification is explored. In addition to learn enough important elements for music signal, a new multilingual i-vector feature is proposed based on the multilingual model. The experimental results show that the multilingual i-vector based models can achieve better classification performances than conventional short-time modeling based methods.

[1]  Martin Karafiát,et al.  Combination of multilingual and semi-supervised training for under-resourced languages , 2014, INTERSPEECH.

[2]  Jason W. Pelecanos,et al.  Online speaker diarization using adapted i-vector transforms , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Constantine Kotropoulos,et al.  Music Genre Classification Using Locality Preserving Non-Negative Tensor Factorization and Sparse Representations , 2009, ISMIR.

[4]  S. Li,et al.  1 D-LDA versus 2 D-LDA : When Is Vector-based Linear Discriminant Analysis Better than Matrix-based ? , 2008 .

[5]  Sebastian Möller,et al.  I-vector speaker verification based on phonetic information under transmission channel effects , 2014, INTERSPEECH.

[6]  Franz de Leon,et al.  Towards efficient music genre classification using FastMap , 2012 .

[7]  Lukás Burget,et al.  Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Simon Dixon,et al.  Improved music feature learning with deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Mazen M. Selim,et al.  2D Face Recognition System Based on Selected Gabor Filters and Linear Discriminant Analysis LDA , 2015, ArXiv.

[10]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[11]  Ngoc Thang Vu,et al.  Improving ASR performance on non-native speech using multilingual and crosslingual information , 2014, INTERSPEECH.

[12]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[13]  Joakim Andén,et al.  Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[14]  Lukás Burget,et al.  Simplification and optimization of i-vector extraction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Hong Yang,et al.  "multilingual" Deep Neural Network for Music Genre Classification , 2015, INTERSPEECH.

[16]  Yannis Stylianou,et al.  Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Xiaolong Wang,et al.  Deep Belief Networks for Automatic Music Genre Classification , 2011, INTERSPEECH.

[18]  Kun-Ming Yu,et al.  Automatic Music Genre Classification Based on Modulation Spectral Analysis of Spectral and Cepstral Features , 2009, IEEE Transactions on Multimedia.

[19]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[20]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).