Implementation of Computationally Efficient Real-Time Voice Conversion

This paper presents an implementation of real-time processing of statistical voice conversion (VC) based on Gaussian mixture models (GMMs). To develop VC applications for enhancing our human-to-human speech communication, it is essential to implement real-time conversion processing. Moreover, it is useful to reduce computational complexity of the conversion processing for making VC applications available even in limited resources. In this paper, we propose a real-time VC method based on a low-delay conversion algorithm considering dynamic features and a global variance. Moreover, we also propose a computationally efficient VC method based on rapid source feature extraction and diagonalization of full covariance matrices. Some experimental results are presented to show that the proposed methods work reasonably well.