论文信息 - Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis

Frame-level acoustic modeling based on Gaussian process regression for statistical nonparametric speech synthesis

This paper proposes a new approach to text-to-speech based on Gaussian processes which are widely used to perform non-parametric Bayesian regression and classification. The Gaussian process regression model is designed for the prediction of frame-level acoustic features from the corresponding frame information. The frame information includes relative position in the phone and preceding and succeeding phoneme information obtained from linguistic information. In this paper, a frame context kernel is proposed as a similarity measure of respective frames. Experimental results using a small data set show the potential of the proposed approach without state-dependent dynamic features or decision-tree clustering used in a conventional HMM-based approach.

[1] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[2] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3] Takashi Fukuda,et al. Orthogonalized Distinctive Phonetic Feature Extraction for Noise-Robust Automatic Speech Recognition , 2004, IEICE Trans. Inf. Syst..

[4] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.

[5] Sunho Park,et al. Gaussian process regression for voice activity detection and speech enhancement , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[6] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[7] Gustav Eje Henter,et al. Gaussian process dynamical models for nonparametric speech representation and synthesis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Heiga Zen,et al. Gaussian Process Experts for Voice Conversion , 2011, INTERSPEECH.

[9] Carl E. Rasmussen,et al. Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.