Voice Conversion for Dubbing Using Linear Predictive Coding and Hidden Markov Model

Dubbing is a term used to describe filling in the sound on film or video. Voice conversion can be done to support dubbing, for purposes such as obtaining a child’s voice for dubbing on children’s films. However, problems frequently occur with this process, including difficulty finding children’s voice resources and difficulty getting children to express the desired tone and mood while recording. Therefore, in this study, we propose a method for creating a cross-gender and age voice conversion from adult voices to children’s voices. The feature extraction method that is used is Linear Predictive Coding, and the modeling method is the Hidden Markov Model. The parts synthesized are fundamental frequency (F0) and spectral content. From the simulation test, the best results for the voice conversion are achieved by Linear Predictive Coding order 19. The best state of Hidden Markov Model modeling is the 5th state. F0 Root Mean Square Error of adult men to children after the conversion increased by 57.7%, while the F0 Root Mean Square Error of adult women to children after the conversion increased by 15.29%. Root Mean Square Error Cepstral after conversion increased by 43.69%. A subjective test was also performed in terms of the mean opinion score. In terms of similarities, mean opinion score testing for Hidden Markov Model has an average value of 2.64, and in terms of quality, testing mean opinion score for Hidden Markov Model has an average value of 3.23. It is hoped that this proposed method can be used in real terms for dubbing in the film industry, especially for Indonesian dialogue.