论文信息 - Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation

Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation

A very high quality speech analysis, modification and synthesis system—STRAIGHT—has now been implemented in C language and operated in realtime. This article first provides a brief summary of STRAIGHT components and then introduces the underlying principles that enabled realtime operation. In STRAIGHT, the built-in extended pitch synchronous analysis, which does not require analysis window alignment, plays an important role in realtime implementation. A detailed description of the processing steps, which are based on the so-called “just-in-time” architecture, is presented. Further, discussions on other issues related to realtime implementation and performance measures are also provided. The software will be available to researchers upon request.

[1] Hideki Kawahara,et al. Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.

[2] Diane Kewley-Port,et al. Vowel formant discrimination for high-fidelity speech. , 2004, The Journal of the Acoustical Society of America.

[3] Hideki Kawahara,et al. Algorithm amalgam: morphing waveform based methods, sinusoidal models and STRAIGHT , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4] Hideki Kawahara,et al. Intelligibility of degraded speech from smeared STRAIGHT spectrum , 2004, INTERSPEECH.

[5] Richard E. Turner,et al. The processing and perception of size information in speech sounds. , 2005, The Journal of the Acoustical Society of America.

[6] Tomoko Yonezawa,et al. Gradually changing expression of singing voice based on morphing , 2005, INTERSPEECH.

[7] Peter F Assmann,et al. Synthesis fidelity and time-varying spectral change in vowels. , 2005, The Journal of the Acoustical Society of America.

[8] Hideki Kawahara,et al. Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9] Hideki Kawahara,et al. Accurate vocal event detection method based on a fixed-point analysis of mapping from time to weighted average group delay , 2000, INTERSPEECH.

[10] A. Oppenheim. Speech analysis-synthesis system based on homomorphic filtering. , 1969, The Journal of the Acoustical Society of America.

[11] Hideki Kawahara,et al. Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12] Roy D. Patterson,et al. Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity , 1999, EUROSPEECH.

[13] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[14] Hideki Kawahara,et al. Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system , 2003, INTERSPEECH.

[15] Roy D. Patterson,et al. Segregating information about the size and shape of the vocal tract using a time-domain auditory model: The stabilised wavelet-Mellin transform , 2002, Speech Commun..