Voice Transformation Using Two-Level Dynamic Warping

Voice transformation, for example, from a male speaker to a female speaker, is achieved here using a two-level dynamic warping. An outer warping process, which temporally aligns blocks of speech (dynamic time warp), invokes an inner warping process, which spectrally aligns based on magnitude spectra (dynamic frequency warp). The mapping function produced by the dynamic frequency warp is used to move spectral information from a source speaker to a target speaker. Information obtained by this process is used to train an artificial neural network to produce spectral warping output information based on spectral input data.

[1]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[2]  Laurent Itti,et al.  shapeDTW: Shape Dynamic Time Warping , 2016, Pattern Recognit..

[3]  Tohru Takagi,et al.  Acoustic parameters of voice individuality and voice-quality control by analysis-synthesis method , 1991, Speech Commun..

[4]  Georgios Evangelidis,et al.  Continuous Action Recognition Based on Sequence Alignment , 2014, International Journal of Computer Vision.

[5]  Moncef Gabbouj,et al.  On the impact of alignment on voice conversion performance , 2008, INTERSPEECH.

[6]  Jovan Popovic,et al.  Style translation for human motion , 2005, ACM Trans. Graph..

[7]  Levent M. Arslan,et al.  Application of voice conversion for cross-language rap singing transformation , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Yannis Stylianou,et al.  Voice Transformation: A survey , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Zhang Linghua,et al.  Vocal tract spectrum transformation based on clustering in voice conversion system , 2012, 2012 IEEE International Conference on Information and Automation.

[10]  Bayya Yegnanarayana,et al.  Transformation of formants for voice conversion using artificial neural networks , 1995, Speech Commun..

[11]  Janet Slifka,et al.  Speaker modification with LPC pole analysis , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[12]  W. Marsden I and J , 2012 .

[13]  Todd K. Moon,et al.  A Tool for Training Speech Imitation Accuracy , 2018, 2018 52nd Asilomar Conference on Signals, Systems, and Computers.