Noise Robust Music Artist Recognition Using I-Vector Features

In music information retrieval (MIR), dealing with different types of noise is important and the MIR models are frequently used in noisy environments such as live performances. Recently, i-vector features have shown great promise for some major tasks in MIR, such as music similarity and artist recognition. In this paper, we introduce a novel noise-robust music artist recognition system using i-vector features. Our method uses a short sample of noise to learn the parameters of noise, then using a Maximum A Postriori (MAP) estimation it estimates clean i-vectors given noisy i-vectors. We examine the performance of multiple systems confronted with different kinds of additive noise in a clean training noisy testing scenario. Using open-source tools, we have synthesized 12 different noisy versions from a standard 20-class music artist recognition dataset encountered with 4 different kinds of additive noise with 3 different Signal-to-Noise-Ratio (SNR). Using these datasets, we carried out music artist recognition experiments comparing the proposed method with the state-ofthe-art. The results suggest that the proposed method outperforms the state-of-the-art.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Driss Matrouf,et al.  Additive noise compensation in the i-vector space for speaker recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[4]  Driss Matrouf,et al.  Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space , 2014, SLSP.

[5]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[6]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[7]  John H. L. Hansen,et al.  Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions , 2010, INTERSPEECH.

[8]  Rui Xia,et al.  Using i-Vector Space Model for Emotion Recognition , 2012, INTERSPEECH.

[9]  Peter Knees,et al.  USING BLOCK-LEVEL FEATURES FOR GENRE CLASSIFICATION , TAG CLASSIFICATION AND MUSIC SIMILARITY ESTIMATION , 2010 .

[10]  Yun Lei,et al.  Simplified VTS-based I-vector extraction in noise-robust speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Anna M. Kruspe,et al.  Improving Singing Language Identification through i-Vector Extraction , 2014, DAFx.

[12]  Daniel P. W. Ellis,et al.  Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[13]  A. Cuhadar,et al.  Evaluation of Speech Enhancement Techniques for Speaker Identification in Noisy Environments , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[14]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[15]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Markus Schedl,et al.  Timbral modeling for music artist recognition using i-vectors , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[17]  Driss Matrouf,et al.  Dealing with additive noise in speaker recognition systems based on i-vector approach , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[18]  Markus Schedl,et al.  I-Vectors for Timbre-Based Music Similarity and Music Artist Classification , 2015, ISMIR.

[19]  Yun Lei,et al.  Unscented transform for ivector-based noisy speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Klaus Seyerlehner FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION , 2010 .

[21]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22]  Gerald Friedland,et al.  An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content , 2013, 2013 IEEE International Symposium on Multimedia.

[23]  Sebastian Ewert,et al.  The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[24]  Hugo Van hamme,et al.  Accent recognition using i-vector, Gaussian Mean Supervector and Gaussian posterior probability supervector for spontaneous telephone speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  James R. Glass,et al.  Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[26]  Paavo Alku,et al.  Comparing spectrum estimators in speaker verification under additive noise degradation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Yun Lei,et al.  A noise robust i-vector extractor using vector taylor series for speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.