A corpus-based concatenative Mandarin singing voice synthesis system

A Mandarin singing voice synthesis (SVS) system is proposed in this paper. It generates a Mandarin song of an artificial singer based on the lyric and the music score information embedded in a MIDI file of the song. To get good quality of the song, two modules are presented, i.e., the synthesis unit selection module and the prosody and amplitude modification module. In the synthesis unit selection module, the corpus that complies with the lyric and closest to the music score information is selected. Then, an adaptive filter based prosody and amplitude modification algorithms are employed on the selected synthesis units. Through the proposed method, the system can synthesis any Mandarin singing voice on-the-fly by providing it the corpus of all syllables for male and female respectively. To increase the efficiency of the system, a preprocessing is also taken on the corpus. Finally, a subjective evaluation based on MOS is taken on the system and the synthesized sounds show good quality.

[1]  T. W. Parks,et al.  Digital Filter Design , 1987 .

[2]  Xavier Serra,et al.  Singing Voice Synthesis Combining Excitation plus Resonance and Sinusoidal plus Residual Models , 2001, ICMC.

[3]  Hideki Kawahara,et al.  YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[4]  Anders Friberg,et al.  CUEX: An algorithm for automatic extraction of expressive tone parameters in music performance from acoustic signals , 2007 .

[5]  I. Stravinsky,et al.  SYNTHESIS AND PROCESSING OF THE SINGING VOICE , 2002 .

[6]  Jyh-Shing Roger Jang,et al.  A corpus-based singing voice synthesis system for mandarin Chinese , 2005, MULTIMEDIA '05.

[7]  Meron Yoram High Quality Singing Synthesis using the Selection-based Synthesis Scheme , 1999 .

[8]  Perry R. Cook,et al.  Singing Voice Synthesis: History, Current Work, and Future Directions , 1996 .

[9]  Keith H Lent Efficient method for pitch shifting digitally sampled sounds , 1989 .

[10]  Hung-Yan Gu,et al.  An acoustic and articulatory knowledge integrated method for improving synthetic Mandarin speech's fluency , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[11]  Jyh-Shing Roger Jang,et al.  An On-the-Fly Mandarin Singing Voice Synthesis System , 2002, IEEE Pacific Rim Conference on Multimedia.

[12]  D. Schwarz,et al.  Corpus-Based Concatenative Synthesis , 2007, IEEE Signal Processing Magazine.

[13]  Mark A. Clements,et al.  A singing voice synthesis system based on sinusoidal modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  J. Bonada,et al.  Synthesis of the Singing Voice by Performance Sampling and Spectral Models , 2007, IEEE Signal Processing Magazine.