Scoring Children's Foreign Language Pronunciation

Automatic speech recognition measures have been investigated as scores of segmental pronunciation quality. In an experiment, contextindependent hidden Markov phone models were trained on native English and Swedish read child speech respectively. Among various studied scores, a likelihood ratio between the scores of forced alignment using English phoneme models and the score of English or Swedish phoneme recognition had the highest correlations to human judgments. The best measures have the power of evaluating the coarse proficiency level of a child but need to be improved for detailed diagnostics of individual utterances and phonemes.