论文信息 - Noise Robust Music Artist Recognition Using I-Vector Features

Noise Robust Music Artist Recognition Using I-Vector Features

In music information retrieval (MIR), dealing with different types of noise is important and the MIR models are frequently used in noisy environments such as live performances. Recently, i-vector features have shown great promise for some major tasks in MIR, such as music similarity and artist recognition. In this paper, we introduce a novel noise-robust music artist recognition system using i-vector features. Our method uses a short sample of noise to learn the parameters of noise, then using a Maximum A Postriori (MAP) estimation it estimates clean i-vectors given noisy i-vectors. We examine the performance of multiple systems confronted with different kinds of additive noise in a clean training noisy testing scenario. Using open-source tools, we have synthesized 12 different noisy versions from a standard 20-class music artist recognition dataset encountered with 4 different kinds of additive noise with 3 different Signal-to-Noise-Ratio (SNR). Using these datasets, we carried out music artist recognition experiments comparing the proposed method with the state-ofthe-art. The results suggest that the proposed method outperforms the state-of-the-art.

Gerhard Widmer | Hamid Eghbal-zadeh

[1] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[2] Driss Matrouf,et al. Additive noise compensation in the i-vector space for speaker recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Larry P. Heck,et al. MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[4] Driss Matrouf,et al. Robust Speaker Recognition Using MAP Estimation of Additive Noise in i-vectors Space , 2014, SLSP.

[5] Daniel Garcia-Romero,et al. Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[6] Patrick Kenny,et al. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[7] John H. L. Hansen,et al. Assessment of single-channel speech enhancement techniques for speaker identification under mismatched conditions , 2010, INTERSPEECH.

[8] Rui Xia,et al. Using i-Vector Space Model for Emotion Recognition , 2012, INTERSPEECH.

[9] Peter Knees,et al. USING BLOCK-LEVEL FEATURES FOR GENRE CLASSIFICATION , TAG CLASSIFICATION AND MUSIC SIMILARITY ESTIMATION , 2010 .

[10] Yun Lei,et al. Simplified VTS-based I-vector extraction in noise-robust speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11] Anna M. Kruspe,et al. Improving Singing Language Identification through i-Vector Extraction , 2014, DAFx.

[12] Daniel P. W. Ellis,et al. Classifying Music Audio with Timbral and Chroma Features , 2007, ISMIR.

[13] A. Cuhadar,et al. Evaluation of Speech Enhancement Techniques for Speaker Identification in Noisy Environments , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[14] Douglas A. Reynolds,et al. Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[15] Yun Lei,et al. Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Markus Schedl,et al. Timbral modeling for music artist recognition using i-vectors , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[17] Driss Matrouf,et al. Dealing with additive noise in speaker recognition systems based on i-vector approach , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[18] Markus Schedl,et al. I-Vectors for Timbre-Based Music Similarity and Music Artist Classification , 2015, ISMIR.

[19] Yun Lei,et al. Unscented transform for ivector-based noisy speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Klaus Seyerlehner. FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION , 2010 .

[21] James H. Elder,et al. Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[22] Gerald Friedland,et al. An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content , 2013, 2013 IEEE International Symposium on Multimedia.

[23] Sebastian Ewert,et al. The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[24] Hugo Van hamme,et al. Accent recognition using i-vector, Gaussian Mean Supervector and Gaussian posterior probability supervector for spontaneous telephone speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25] James R. Glass,et al. Cosine Similarity Scoring without Score Normalization Techniques , 2010, Odyssey.

[26] Paavo Alku,et al. Comparing spectrum estimators in speaker verification under additive noise degradation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[28] Yun Lei,et al. A noise robust i-vector extractor using vector taylor series for speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.