Temporally Variable Multi attribute Morphing of Arbitrarily Many Voices for Exploratory Research of Speech Prosody

Morphing provides a flexible research strategy for non- and para linguistic aspects of speech. Recent extension of the morphing procedure has made it possible to interpolate and extrapolate physical attributes of arbitrarily many utterance examples. By using utterances representing typical instantiation of the non- and para linguistic information in question and introducing systematic perturbation of trajectories in a high-dimensional space spanned by a set of indexed weights for the physical parameters of utterances, the physical correlates of such information can be represented in terms of a differential geometrical concept. Formulation of this extended morphing framework in generalized representations and a few representative cases of applications are discussed with comments on the limitations of the current implementation and possible solutions.

[1]  Hideki Kawahara,et al.  Vocal Attractiveness Increases by Averaging , 2010, Current Biology.

[2]  Hideki Kawahara,et al.  Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems , 2010, INTERSPEECH.

[3]  Verena G. Skuk,et al.  Speaker perception. , 2014, Wiley interdisciplinary reviews. Cognitive science.

[4]  HIDEKI KAWAHARA,et al.  Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework , 2011 .

[5]  Roddy Cowie,et al.  Emotional speech: Towards a new generation of databases , 2003, Speech Commun..

[6]  Marc Schröder,et al.  Emotional speech synthesis: a review , 2001, INTERSPEECH.

[7]  Hiroya Fujisaki,et al.  Prosody, Models, and Spontaneous Speech , 1997, Computing Prosody.

[8]  Hideki Kawahara,et al.  High quality voice manipulation method based on the vocal tract area function obtained from sub-band LSP of straight spectrum , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..

[10]  Hideki Kawahara,et al.  Temporally variable multi-aspect N-way morphing based on interference-free speech representations , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[11]  Hideki Kawahara,et al.  Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  N. Conrad,et al.  The emerging role of triple helices in RNA biology , 2014, Wiley interdisciplinary reviews. RNA.

[13]  Marc Schröder,et al.  Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Hideki Kawahara,et al.  Auditory Adaptation in Voice Perception , 2008, Current Biology.

[15]  Hideki Kawahara,et al.  Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.