Inducing phonetics from dialect variation

Structuralists famously observed that language is "un syst^ eme o^ u tout se tient" (Meillet, 1903, p. 407), insisting that the system of relations of linguistic units was more important than their concrete content. This study attempts to derive content from relations, in particular phonetic content from the system of alternative pronunciations used in dierent geographical varieties. It proceeds from data documenting language variation, examining two dialect atlases each containing the phonetic transcriptions of the same set of words at hundreds of sites. We collect the correspondences via an alignment procedure, and then apply an information-theoretic measure, pointwise mutual information, assigning smaller segment distances to segments which frequently correspond. We iterate alignment and information-theoretic distance assignment until both stabilize and we evaluate the quality of the phonetic distances obtained by comparing them to acoustic vowel distances. For both Dutch and German, we nd strong correlations between the induced pho

[1]  John Nerbonne,et al.  Evaluating the Pairwise String Alignment of Pronunciations , 2009, LaTeCH - SHELT&R@EACL.

[2]  John Laver,et al.  Principles of Phonetics: Principles of transcription , 1994 .

[3]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[4]  W. Kretzschmar Linguistic atlas of the middle and south atlantic states , 1995 .

[5]  John Nerbonne,et al.  An Aggregate Analysis of Pronunciation in the Goeman-Taeldeman-van Reenen-Project Data , 2007 .

[6]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[7]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[8]  R. Plomp,et al.  Frequency analysis of Dutch vowels from 50 male speakers. , 1973, The Journal of the Acoustical Society of America.

[9]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[10]  G. Boulianne,et al.  Dialektklassifikation auf der Grundlage Aggregierter Ausspracheunterschiede , 2006 .

[11]  John Nerbonne,et al.  Inducing Sound Segment Differences Using Pair Hidden Markov Models , 2007, SIGMORPHON.

[12]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[13]  John J. Ohala Comparison of speech sounds: distance vs. cost metrics , 1997 .

[14]  R. Plomp,et al.  Frequency analysis of Dutch vowels from 25 female speakers , 1973 .

[15]  Johan Taeldeman,et al.  Fonologie en morfologie van de Nederlandse dialecten: een nieuwe materiaalverzameling en twee nieuwe atlasprojecten , 1996 .

[16]  A. Meillet,et al.  Introduction à l'étude comparative des langues indoeuropéennes , 1922 .

[17]  Jelena Prokic,et al.  Families and resemblances , 2010 .

[18]  Wilbert Heeringa,et al.  Measuring Dialect Differences , 2009 .

[19]  Louis Goldstein,et al.  Consonant features in speech errors , 1980 .

[20]  Wilbert Jan Heeringa Measuring dialect pronunciation differences using Levenshtein distance , 2004 .

[21]  H. Traunmüller Analytical expressions for the tonotopic sensory scale , 1990 .

[22]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.