The motivation for recording a bilingual database arose from the EMIME speech-to-speech translation task. In this project, we are aiming for personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. However, how do we measure whether our modeling attempts are successful or not that is how are we to measure whether or not a user sounds similar in two different languages? Aside from the complications associated with asking listeners to compare natural speech to synthetic speech there is an even more fundamental question we would like to see answered first. How well do listeners judge speaker similarity across language boundaries when the stimuli consist of natural speech. To investigate this we needed a database of bilingual data. This paper describes the design and collection of this database. In designing the bilingual database for our talker discrimination experiments we have two assumptions. First of all, we assume talker discrimination is easier when the different languages spoken by individual talkers are from the same language family. That is, listeners should be able to judge more accurately whether or not a talker is the same when the talker is speaking two closely related languages. Secondly, if bilingual talkers are highly fluent in their two languages, talker discrimination should be more difficult. Anecdotal evidence seems to suggest that proficient non-native talkers of English do not necessarily sound like the same person when speaking their native language. The languages under consideration in EMIME are Japanese, Mandarin, Finnish and English. In [1] the English/German and English/Finnish portions of the EMIME database are described. This report covers the English/Mandarin portion of recordings. The aim of the experiment described in this paper is to select talkers with the least degree of perceived foreign accent because, as stated above, we expect that the more native the bilingual talker sounds in English, the more difficult it will be for listeners to recognize them as the same talker in both their native language (L1) and their second language (L2). This paper addresses the following question. Which Mandarin talkers in the EMIME database have the least degree of perceived foreign accent?
[1]
Krzysztof Marasek,et al.
SPEECON – Speech Databases for Consumer Devices: Database Specification and Validation
,
2002,
LREC.
[2]
Simon King,et al.
The Blizzard Challenge 2009
,
2009
.
[3]
Janet M. Baker,et al.
The Design for the Wall Street Journal-based CSR Corpus
,
1992,
HLT.
[4]
M. Wester.
The EMIME Bilingual Database
,
2010
.
[5]
Philipp Koehn,et al.
Europarl: A Parallel Corpus for Statistical Machine Translation
,
2005,
MTSUMMIT.
[6]
J. Flege,et al.
Talker and listener effects on degree of perceived foreign accent.
,
1992,
The Journal of the Acoustical Society of America.
[7]
Simon King,et al.
The Blizzard Challenge 2008
,
2008
.