Multidimensional scaling of systems in the Voice Conversion Challenge 2016

This study investigates how listeners judge the similarity of voice converted voices using a talker discrimination task. The data used is from the Voice Conversion Challenge 2016. 17 participants from around the world took part in building voice converted voices from a shared data set of source and target speakers. This paper describes the evaluation of similarity for four of the source-target pairs (two intra-gender and two cross-gender) in more detail. Multidimensional scaling was performed to illustrate where each system was perceived to be in an acoustic space compared to the source and target speakers and to each other.

[1]  Simon King,et al.  The Blizzard Challenge 2007 , 2007 .

[2]  Alexander Kain,et al.  Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Mirjam Wester,et al.  Talker discrimination across languages , 2012, Speech Commun..

[4]  Pascal Belin,et al.  Perceptual scaling of voice identity: common dimensions for different vowels and speakers , 2010, Psychological research.

[5]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[6]  Simon King,et al.  Statistical analysis of the Blizzard Challenge 2007 listening test results , 2007 .

[7]  Zinny S. Bond,et al.  Same talker, different language , 2000, Applied Psycholinguistics.

[8]  Hui Liang,et al.  Cross-Lingual Speaker Discrimination Using Natural and Synthetic Speech , 2011, INTERSPEECH.

[9]  Tomoki Toda,et al.  The Voice Conversion Challenge 2016 , 2016, INTERSPEECH.

[10]  Zhizheng Wu,et al.  Analysis of the Voice Conversion Challenge 2016 Evaluation Results , 2016, INTERSPEECH.

[11]  Jody Kreiman,et al.  Comparing discrimination and recognition of unfamiliar voices , 1991, Speech Commun..

[12]  Derek P. Brock,et al.  Speaker recognizability testing for voice coders , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[13]  Anders Eriksson,et al.  Voice similarity — a comparison between judgements by human listeners and automatic voice comparison , 2010 .

[14]  James D. Harnsberger,et al.  The perception of Malayalam nasal consonants by Marathi, Punjabi, Tamil, Oriya, Bengali, and American English listeners: A multidimensional scaling analysis , 2001, J. Phonetics.

[15]  J E Flege,et al.  The perception of English and Spanish vowels by native English and Spanish listeners: a multidimensional scaling analysis. , 1995, The Journal of the Acoustical Society of America.

[16]  Simon King,et al.  Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis , 2011, Speech Commun..

[17]  Robert E Remez,et al.  On the perception of similarity among talkers. , 2007, The Journal of the Acoustical Society of America.

[18]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[19]  R A Harshman,et al.  Crosslanguage Differences in Tone Perception: a Multidimensional Scaling Investigation , 1978, Language and speech.