Evaluating two versions of the momel pitch modelling algorithm on a corpus of read speech in Korean

The Momel algorithm provides an automatic factoring of raw fundamental frequency into two components: a microprosodic component, corresponding to local variations of pitch caused by the phonetic nature of the speech segments and a macroprosodic component corresponding to the overall pitch pattern of the utterance which is then represented as a sequence of pitch targets. An earlier evaluation estimated the overall efficiency of the algorithm (F-measure) at around 95% on a corpus of read speech for 5 European languages and at around 93% for a corpus of spontaneous speech. In this paper we present the results of the evaluation of the output of two versions of the Momel algorithm as compared with manually corrected pitch targets for a corpus of just over 2 hours of read speech in Korean (40 continuous 5-sentence passages, each read by 5 male and 5 female speakers). The results show that the new version of the Momel algorithm performs systematically better than the earlier version.

[1]  Roxane Bertrand,et al.  De l'Hétérogénéité de la Parole : analyse énonciative de phénomènes prosodiques et kinésiques dans l'interaction interindividuelle , 1999 .

[2]  Daniel Hirst,et al.  Levels of Representation and Levels of Analysis for the Description of Intonation Systems , 2000 .

[3]  E. Barnard,et al.  Automatic intonation modeling with INTSINT , 2004 .

[4]  Estelle Campione,et al.  Etiquetage semi-automatique de la prosodie dans les corpus oraux : algorithmes et méthodologie , 2001 .

[5]  Daniel Hirst,et al.  Form and function in the representation of speech prosody , 2005, Speech Commun..

[6]  Monica Estruch,et al.  Evaluation de l'algorithme de stylisation mélodique MOMEL et du système de codage symbolique INTSINT avec un corpus de passages en catalan , 2000 .

[7]  Pascale Nicolas Contribution de la prosodie à l'amélioration de la parole de synthèse : cas du texte lu en français , 1995 .

[8]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[9]  A. D. Dominicis,et al.  Intonation Systems: A Survey of Twenty Languages , 1999 .

[10]  Cyril Auran Prosodie et anaphore dans le discours en anglais et en français : cohésion et attribution référentielle , 2004 .

[11]  Daniel Hirst,et al.  Analysis by synthesis of speech prosody: the Prozed environment , 2005, INTERSPEECH.

[12]  Dafydd Gibbon,et al.  EUROM - a spoken language resource for the EU - the SAM projects , 1995, EUROSPEECH.

[13]  Arman Maghbouleh ToBI accent type recognition , 1998, ICSLP.

[14]  Irina Nesterenko Analyse formelle et implémentation phonétique de l'intonation du parler russe spontané en vue d'une application à la synthèse vocale , 2006 .

[15]  Hansjörg Mixdorff,et al.  A novel approach to the fully automatic extraction of Fujisaki model parameters , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Daniel Hirst,et al.  Automatic modelling of fundamental frequency using a quadratic sline function , 1993 .

[17]  Daniel Hirst,et al.  A PRAAT PLUGIN FOR MOMEL AND INTSINT WITH IMPROVED ALGORITHMS FOR MODELLING AND CODING INTONATION. , 2007 .

[18]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[19]  Cristel Portes,et al.  Prosodie et économie du discours : Spécificité phonétique, écologie discursive et portée pragmatique de l'intonation d'implication , 2004 .

[20]  Elsa Mora Gallardo Caractérisation prosodique de la variation dialectale de l'Espagnol parlé au Vénézuela , 1996 .