Catia Cucchiarini, Helmer Strik and Lou BovesUniversity of Nijmegen1. IntroductionEvery year in the Netherlands lots of foreigners take part in examinations aimed at testing their proficiencyin Dutch. In order to achieve greater efficiency and lower costs, attempts are being made to automate atleast part of the testing procedure. Automatic testing of receptive skills such as reading and listeningappears to be relatively simple, because the response tasks that are often used -multiple choice, matchingand cloze- are easy to score. Developing computer tests for productive skills such as speaking and writing ismore difficult because of the open-ended nature of the input. On the other hand, it is precisely for testingthese latter skills that extremely high costs are incurred, because the task human raters have to carry out isvery time-consuming.Recent advances in speech recognition research seem to suggest that there are possibilities of usingcomputers to test at least some aspects of oral proficiency. For instance, Bernstein et al. (1990), Hiller et al.(1994), Eskenazi (1996) and Neumeyer et al. (1996) describe automatic methods for evaluating Englishpronunciation. In 1996 we started a research project which aims at developing a similar system forautomatic assessment of foreign speakers’ pronunciation of Dutch. In this project the University ofNijmegen cooperates with the Dutch National Institute for Educational Measurement (CITO), Swets TestServices of Swets & Zeitlinger and PTT Telecom.In this paper we first describe the goals of the present experiment (section 2). We then go on toconsider how this study differs from previous ones (section 3). In section 4 the methodology is described.The results of this experiment are presented in section 5. Finally, in section 6 the results are discussed andsome conclusions are drawn. 2. Aims of the present studyGiven the successful attempts at developing automatic pronunciation testing systems for English, wedecided to develop a similar test for assessing foreign speakers’ pronunciation of Dutch. To this end weused the automatic speech recognizer developed at the University of Nijmegen. Some of the informationconcerning this recognizer is provided below. Further details can be found in Strik et al. (1997). The firstaim of the experiment reported on here is to determine to what extent scores computed by our speechrecognizer can predict pronunciation scores assigned by human experts. Furthermore, we wanted todetermine whether asking the human experts to assign specific ratings of pronunciation quality along withglobal ratings would enhance our understanding of the relation between human scores and machine scores.Another aim of this experiment was to determine whether native and nonnative speakers of Dutch areevaluated in the same way by man and machine.3. How this study differs from previous onesIn the various methods for automatic pronunciation assessment developed so far (e.g. Bernstein et al. 1990and Neumeyer et al. 1996) different machine measures have been used for automatic scoring: HMM log-likelihood scores, timing scores, phone classification error scores and segment duration scores. Recently,also phone log-posterior probability scores have been investigated by Franco et al. (1997). In all these studies, the validity of machine scores is established by comparing them with pronunciationscores assigned by human experts (human scores). In general, the raters are asked to assign a globalpronunciation score to each of the several sentences uttered by each speaker (sentence level rating). Thescores for all the sentences by one speaker are then averaged so as to obtain an overall speaker score(speaker level rating) (see Neumeyer et al. 1996 and Franco et al. 1997). Although this procedure may seemlogical at first sight, there are some problems with it.
[1]
Yoon Kim,et al.
Automatic pronunciation scoring for language instruction
,
1997,
1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[2]
Lou Boves,et al.
A spoken dialog system for the Dutch public transport information service
,
1997,
Int. J. Speech Technol..
[3]
G. H. Slusser,et al.
Statistical analysis in psychology and education
,
1960
.
[4]
Sheila Embleton,et al.
Studies of error gravity: Native reactions to errors produced by Swedish learners of English
,
1983
.
[5]
Mitch Weintraub,et al.
Automatic text-independent pronunciation scoring of foreign language student speech
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[6]
K. Koehler,et al.
The Effect of Foreign Accent and Speaking Rate on Native Speaker Comprehension
,
1988
.
[7]
J. Flege,et al.
Talker and listener effects on degree of perceived foreign accent.
,
1992,
The Journal of the Acoustical Society of America.
[8]
Mitch Weintraub,et al.
Automatic evaluation and training in English pronunciation
,
1990,
ICSLP.
[9]
Joan M. Fayer,et al.
Native and Nonnative Judgments of Intelligibility and Irritation
,
1987
.
[10]
Maxine Eskénazi,et al.
Detection of foreign speakers' pronunciation errors for second language training-preliminary results
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[11]
John Laver,et al.
AN AUTOMATED SYSTEM FOR COMPUTER‐AIDED PRONUNCIATION LEARNING
,
1994
.
[12]
Lou Boves,et al.
The Dutch polyphone corpus
,
1995,
EUROSPEECH.
[13]
K. Koehler,et al.
The Relationship Between Native Speaker Judgments of Nonnative Pronunciation and Deviance in Segmentais, Prosody, and Syllable Structure
,
1992
.
[14]
Faculteit der Letteren,et al.
Begrijpelijkheid van buitenlanders : de rol van fonische versus niet fonische factoren
,
1981
.