论文信息 - Data augmentation with moment-matching networks for i-vector based speaker verification

Data augmentation with moment-matching networks for i-vector based speaker verification

This paper proposes an i-vector generation scheme with conditional generative moment-matching networks (MMNs) for speaker verification. In this scheme, multiple i-vectors for each enrollment speaker are randomly generated from trained MMNs and noise distributions. The randomly generated i-vectors are assumed to represent diverse variations for each enrollment speaker. Since this paper is aim to provide new possibility of the i-vector augmentation with MMNs, i-vector-based preliminary speaker verification evaluation with support vector machine (SVM) are performed. The results of SVM classification show that the generated i-vectors are contributed for estimation of the accurate SVM classifiers of enrollment speakers. From the experimental results, we also compare the distributions of the generated i-vectors with those of the original ones and discuss them.

Shinnosuke Takamichi | Tomoko Matsui | Sayaka Shiota

[1] Amos J. Storkey,et al. Data Augmentation Generative Adversarial Networks , 2017, ICLR 2018.

[2] Jun Zhu,et al. Conditional Generative Moment-Matching Networks , 2016, NIPS.

[3] Richard S. Zemel,et al. Generative Moment Matching Networks , 2015, ICML.

[4] Tomoki Toda,et al. A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models , 2016, IEICE Trans. Inf. Syst..

[5] Andreas Stolcke,et al. Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[6] Patrick Kenny,et al. Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[7] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[8] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[9] Sridha Sridharan,et al. Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques , 2014, Speech Commun..

[10] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.

[11] Abeer Alwan,et al. CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances , 2017, INTERSPEECH.