Noise-robust voice conversion using a small parallel data based on non-negative matrix factorization

This paper presents a novel framework of voice conversion (VC) based on non-negative matrix factorization (NMF) using a small parallel corpus. In our previous work, a VC technique using NMF for noisy environments has been proposed, and it requires parallel exemplars (dictionary), which con sist of the source exemplars and target exemplars, having the same texts uttered by the source and target speakers. The large parallel corpus is used to construct a conversion function in NMF-based VC (in the same way as common GMM-based VC). In this paper, an adaptation matrix in an NMF frame work is introduced to adapt the source dictionary to the target dictionary. This adaptation matrix is estimated using a small parallel speech corpus only. The effectiveness of this method is confirmed by comparing its effectiveness with that of a con ventional NMF-based method and a GMM-based method in a noisy environment.

[1]  Athanasios Mouchtaris,et al.  Nonparallel training for voice conversion based on a parameter adaptation approach , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Tomoki Toda,et al.  Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech , 2012, Speech Commun..

[3]  Tomoki Toda,et al.  Eigenvoice conversion based on Gaussian mixture model , 2006, INTERSPEECH.

[4]  Tetsuya Takiguchi,et al.  A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary , 2014, EURASIP J. Audio Speech Music. Process..

[5]  Hakan Erdogan,et al.  Adaptation of Speaker-Specific Bases in Non-Negative Matrix Factorization for Single Channel Speech-Music Separation , 2011, INTERSPEECH.

[6]  Tetsuya Takiguchi,et al.  Exemplar-based voice conversion in noisy environment , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[7]  Moncef Gabbouj,et al.  Voice Conversion Using Partial Least Squares Regression , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Tetsuya Takiguchi,et al.  Multimodal voice conversion using non-negative matrix factorization in noisy environments , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Mike E. Davies,et al.  23rd European Signal Processing Conference (EUSIPCO 2015) , 2015 .

[11]  Chung-Hsien Wu,et al.  Map-based adaptation for speech conversion using adaptation data selection and non-parallel training , 2006, INTERSPEECH.

[12]  K. Shikano,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Shingo Kuroiwa,et al.  CENSREC-1-C: An evaluation framework for voice activity detection under noisy environments , 2009 .

[14]  Xavier Rodet,et al.  Intonation Conversion from Neutral to Expressive Speech , 2011, INTERSPEECH.

[15]  Eric Moulines,et al.  Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..

[16]  Eric Moulines,et al.  Voice transformation using PSOLA technique , 1991, Speech Commun..

[17]  Haizhou Li,et al.  Exemplar-Based Sparse Representation With Residual Compensation for Voice Conversion , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[19]  Keikichi Hirose,et al.  One-to-Many Voice Conversion Based on Tensor Representation of Speaker Space , 2011, INTERSPEECH.

[20]  Ieee Staff 2017 25th European Signal Processing Conference (EUSIPCO) , 2017 .

[21]  Tomoki Toda,et al.  Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.