Comparing Languages Using Hierarchical Prosodic Analysis

We present a novel, data-driven approach to assessing mutual similarities and differences among a group of languages, based on purely prosodic characteristics, namely f0 and energy envelope signals. These signals are decomposed using continuous wavelet transform; the components represent f0 and energy patterns on three levels of prosodic hierarchy roughly corresponding to syllables, words and phrases. Unigram language models with states derived from a combination of ∆-features obtained from these components are trained and compared using a mutual perplexity measure. In this pilot study we apply this approach to a small corpus of spoken material from seven languages (Estonian, Finnish, Hungarian, German, Swedish, Russian and Slovak) with a rich history of mutual language contacts. We present similarity trees (dendrograms) derived from the models using the hierarchically decomposed prosodic signals separately as well as combined, and compare them with patterns obtained from non-decomposed signals. We show that (1) plausible similarity patterns, reflecting language family relationships and the known contact history can be obtained even from a relatively small data set, and (2) the hierarchical decomposition approach using both f0 and energy provides the most comprehensive results.

[1]  Juhani Järvikivi,et al.  Phonetic tone signals phonological quantity and word structure. , 2010, The Journal of the Acoustical Society of America.

[2]  Carlos Gussenhoven,et al.  Durational variability in speech and the Rhythm Class Hypothesis , 2002 .

[3]  S. Jun,et al.  Prosodic typology : the phonology of intonation and phrasing , 2014 .

[4]  Vicsi Klára,et al.  Voice Disorder Detection on the Basis of Continuous Speech , 2011 .

[5]  Stefan Benus,et al.  Accentual phrases in Slovak and Hungarian , 2014 .

[6]  Juraj Simko,et al.  Hierarchical representation and estimation of prosody using continuous wavelet transform , 2017, Comput. Speech Lang..

[7]  Pärtel Lippus,et al.  Quantity-related variation of duration, pitch and vowel quality in spontaneous Estonian , 2013, J. Phonetics.

[8]  A. D. Dominicis,et al.  Intonation Systems: A Survey of Twenty Languages , 1999 .

[9]  Daniel Jones,et al.  The Phonetics of Russian , 1969 .

[10]  Gösta Bruce,et al.  The phonetic profile of Swedish , 2006 .

[11]  Sandra E. Hutchins,et al.  On using prosodic cues in automatic language identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Fred Cummins,et al.  Automatic discrimination among languages based on prosody alone , 1999 .

[13]  Auli Hakulinen Iso suomen kielioppi , 2004 .

[14]  W. Marsden I and J , 2012 .

[15]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[16]  Daniil Kocharov,et al.  CoRuSS - a New Prosodically Annotated Corpus of Russian Spontaneous Speech , 2016, LREC.

[17]  S. Benus,et al.  Stress and phonemic length in the perception of Slovak vowels , 2012 .

[18]  Mati Erelt,et al.  Eesti keele käsiraamat , 2007 .

[19]  D. Gil,et al.  A PROSODIC TYPOLOGY OF LANGUAGE , 1986 .

[20]  Larry M. Hyman Word-prosodic typology , 2006, Phonology.

[21]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.