Vocal Manipulation Based on Pitch Transcription and Its Application to Interactive Entertainment for Karaoke

A real-time vocal manipulation system is described for improving karaoke. Karaoke is an interactive entertainment system where users sing along with recorded music, and it is used all over the world. However, although the users should sing with accurate pitch, it is difficult for the tone-deaf people to sing with accurate pitch. In this paper, a real-time vocal manipulation system is proposed to help tone-deaf people. The system consists of vocoder-based voice synthesis method that can synthesize the voiced sound with fundamental frequency (pitch) and spectral envelope (timbre). Vocal manipulation is achieved based on pitch transcription by replacing the pitch of a tone-deaf person with that of a professional singer. Subjective evaluation is carried out to verify the effectiveness of the proposed system. The results suggested that the proposed system can manipulate vocal sounds in real time.

[1]  A. Oppenheim Speech analysis-synthesis system based on homomorphic filtering. , 1969, The Journal of the Acoustical Society of America.

[2]  B. Atal,et al.  Speech analysis and synthesis by linear prediction of the speech wave. , 1971, The Journal of the Acoustical Society of America.

[3]  E. A. Flinn Comments on “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave” [B. S. Atal and S. L. Hanauer, J. Acoust. Soc. Amer. 50, 637–655 (1971)] , 1972 .

[4]  Xavier Serra,et al.  Voice Morphing System for Impersonating in Karaoke Applications , 2000, ICMC.

[5]  Hideki Kenmochi,et al.  VOCALOID - commercial singing synthesizer based on sample concatenation , 2007, INTERSPEECH.

[6]  Hideki Kawahara,et al.  Study on manipulation method of voice quality based on the vocal tract area function , 2008, INTERSPEECH.

[7]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Hideki Kawahara,et al.  v.morish'09: A Morphing-Based Singing Design Interface for Vocal Melodies , 2009, ICEC.

[9]  Jérôme Dupire,et al.  Entertainment Computing – ICEC 2009 , 2009, Lecture Notes in Computer Science.

[10]  Hideki Kawahara,et al.  Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Hideki Kawahara,et al.  High-quality and light-weight voice transformation enabling extrapolation without perceptual and objective breakdown , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.