Timbral modeling for music artist recognition using i-vectors

Music artist (i.e., singer) recognition is a challenging task in Music Information Retrieval (MIR). The presence of different musical instruments, the diversity of music genres and singing techniques make the retrieval of artist-relevant information from a song difficult. Many authors tried to address this problem by using complex features or hybrid systems. In this paper, we propose new song-level timbre-related features that are built from frame-level MFCCs via so-called i-vectors. We report artist recognition results with multiple classifiers such as K-nearest neighbor, Discriminant Analysis and Naive Bayes using these new features. Our approach yields considerable improvements and outperforms existing methods. We could achieve an 84.31% accuracy using MFCC features on a 20-classes artist recognition task.

[1]  Sajad Shirali-Shahreza,et al.  Fast and scalable system for automatic artist identification , 2009, IEEE Transactions on Consumer Electronics.

[2]  Daniel P. W. Ellis,et al.  Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[3]  Petri Toiviainen,et al.  A Matlab Toolbox for Music Information Retrieval , 2007, GfKl.

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[7]  Ieee Staff 2017 25th European Signal Processing Conference (EUSIPCO) , 2017 .

[8]  Beth Logan,et al.  Mel Frequency Cepstral Coefficients for Music Modeling , 2000, ISMIR.

[9]  Yi-Hsuan Yang,et al.  Sparse Modeling for Artist Identification: Exploiting Phase Information and Vocal Separation , 2013, ISMIR.

[10]  Benjamin Schrauwen,et al.  Audio-based Music Classification with a Pretrained Convolutional Network , 2011, ISMIR.

[11]  Daniel P. W. Ellis,et al.  Song-Level Features and Support Vector Machines for Music Classification , 2005, ISMIR.

[12]  G. Peeters,et al.  GMM SUPERVECTOR FOR CONTENT BASED MUSIC SIMILARITY , 2011 .

[13]  Gerald Friedland,et al.  An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content , 2013, 2013 IEEE International Symposium on Multimedia.

[14]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Douglas Eck,et al.  Aggregate features and ADABOOST for music classification , 2006, Machine Learning.

[16]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[17]  Pavel P. Kuksa Efficient multivariate kernels for sequence classification , 2014 .

[18]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[19]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[20]  Youngmoo E. Kim,et al.  Singer Identification in Popular Music Recordings Using Voice Coding Features , 2002 .

[21]  Hugo Van hamme,et al.  Accent recognition using i-vector, Gaussian Mean Supervector and Gaussian posterior probability supervector for spontaneous telephone speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Lukás Burget,et al.  Discriminatively trained Probabilistic Linear Discriminant Analysis for speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[24]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[25]  Tong Zhang,et al.  Automatic singer identification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[26]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Rui Xia,et al.  Using i-Vector Space Model for Emotion Recognition , 2012, INTERSPEECH.

[28]  Driss Matrouf,et al.  A straightforward and efficient implementation of the factor analysis model for speaker verification , 2007, INTERSPEECH.

[29]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..