论文信息 - A robust modeling technique against training data errors for HMM-based singing voice synthesis

A robust modeling technique against training data errors for HMM-based singing voice synthesis

1 名古屋工業大学 Nagoya Institute of Technology a) mushika@sp.nitech.ac.jp た歌声データから,音色を表すスペクトルや音高を表す基本周波数,音の長さを表す継続長などの歌声の特徴を抽出し HMMによってモデル化する.通常,音声は音素単位でモデル化されるが,音符の高さや長さ,テンポや強弱記号などの楽譜情報を考慮することで,より精度の高いモデル化を行う.合成時には,与えられた楽譜に従ってモデルを連結し,動的特徴量を考慮してパラメータを生成することで,歌声を生成する.HMM歌声合成は,素片接続に基づく合成手法にはない以下のような特徴を持つ. ( 1 ) 与えられたデータに基づいてモデルを自動学習するため,声質だけでなく,プレパレーションやオーバーシュートなどの基本周波数の変化や,前のりや後のりなどの音符に対する発音タイミングの変化といった歌

[1] Yoshihiko Nankaku,et al. Pitch adaptive training for hmm-based singing voice synthesis , 2014, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.

[3] Heiga Zen,et al. Hidden Semi-Markov Model Based Speech Synthesis System , 2006 .

[4] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[5] Keiichi Tokuda,et al. Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6] Naonori Ueda,et al. Deterministic annealing EM algorithm , 1998, Neural Networks.

[7] Keiichi Tokuda,et al. Speaker interpolation in HMM-based speech synthesis system , 1997, EUROSPEECH.

[8] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9] B. Juang,et al. Context-dependent Phonetic Hidden Markov Models for Speaker-independent Continuous Speech Recognition , 2008 .

[10] Hideki Kenmochi,et al. VOCALOID - commercial singing synthesizer based on sample concatenation , 2007, INTERSPEECH.

[11] Junichi Yamagishi,et al. Average-Voice-Based Speech Synthesis , 2006 .

[12] T. Masuko. Speech synthesis from HMMs using dynamic features , 1996 .

[13] Steve J. Young,et al. Tree-Based State Tying for High Accuracy Modelling , 1994, HLT.