Dialect Pronunciation Comparison and Spoken Word Recognition

Two adaptations of the regular Levenshtein distance algorithm are proposed based on psycholinguistic work on spoken word recognition. The first adaptation is inspired by the Cohort model which assumes that the word-initial part is more important for word recognition than the word-final part. The second adaptation is based on the notion that stressed syllables contain more information and are more important for word recognition than unstressed syllables. The adapted algorithms are evaluated on a large contemporary collection of Dutch dialect material, the Goeman-Taeldeman-Van ReenenProject (GTRP, collected 1980–1995) and a relatively small Norwegian dataset for which dialect speakers judgments of proximity is available.

[1]  de Georges Schutter,et al.  Morfologische atlas van de Nederlandse dialecten , 2005 .

[2]  P. Jusczyk,et al.  Speech Perception and Spoken Word Recognition: Past and Present , 2002, Ear and hearing.

[3]  N. Sebastián-Gallés,et al.  Constraints of vowels and consonants on lexical selection: Cross-linguistic comparisons , 2000, Memory & cognition.

[4]  David Carter,et al.  Lexical stress and lexical discriminability: Stressed syllables are more informative, but why? , 1989 .

[5]  Simon Kirby,et al.  Measuring Language Divergence by Intra-Lexical Comparison , 2006, ACL.

[6]  William D Marslen-Wilson,et al.  Processing interactions and lexical access during word recognition in continuous speech , 1978, Cognitive Psychology.

[7]  Douglas L. Nelson,et al.  Rated acoustic (articulatory) similarity for word pairs varying in number and ordinal position of common letters , 1970 .

[8]  Ann Cutler,et al.  Prosody in the Comprehension of Spoken Language: A Literature Review , 1997, Language and speech.

[9]  Charlotte Gooskens,et al.  Travel time as a predictor of linguistic distance , 2005 .

[10]  John Nerbonne,et al.  Inducing Sound Segment Differences Using Pair Hidden Markov Models , 2007, SIGMORPHON.

[11]  R. Cole Listening for mispronunciations: A measure of what we hear during speech , 1973 .

[12]  John Nerbonne,et al.  Toward a dialectological yardstick* , 2007, J. Quant. Linguistics.

[13]  Zinny S. Bond,et al.  Word reconstruction and consonant features in English and Spanish , 2002 .

[14]  B. Ooijen Vowel mutability and lexical selection in English: Evidence from a word reconstruction task , 1996 .

[15]  P. Luce,et al.  Spoken Word Recognition: The Challenge of Variation , 2005 .

[16]  Ulrike Hahn,et al.  What makes words sound similar? , 2005, Cognition.

[17]  W. Heeringa,et al.  Evaluation of String Distance Algorithms for Dialectology , 2006 .

[18]  John Nerbonne,et al.  An Aggregate Analysis of Pronunciation in the Goeman-Taeldeman-van Reenen-Project Data , 2007 .

[19]  W. Marslen-Wilson Functional parallelism in spoken word-recognition , 1987, Cognition.

[20]  K. Bühler Sprachtheorie: Die Darstellungsfunktion der Sprache , 1934 .

[21]  P. Vitz,et al.  Predicting the judged “similarity of sound” of english words , 1973 .

[22]  Yves Van de Peer,et al.  zt: A Sofware Tool for Simple and Partial Mantel Tests , 2002 .

[23]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[24]  A. Walley Spoken word recognition by young children and adults , 1988 .

[25]  Grzegorz Kondrak,et al.  Automatic identification of confusable drug names , 2006, Artif. Intell. Medicine.

[26]  Grzegorz Kondrak,et al.  Evaluation of Several Phonetic Similarity Algorithms on the Task of Cognate Identification , 2006 .

[27]  Wilbert Jan Heeringa Measuring dialect pronunciation differences using Levenshtein distance , 2004 .

[28]  Johan Taeldeman,et al.  Fonologie en morfologie van de Nederlandse dialecten: een nieuwe materiaalverzameling en twee nieuwe atlasprojecten , 1996 .

[29]  W. Marslen-Wilson,et al.  Accessing Spoken Words: The Importance of Word Onsets , 1989 .