Machine learning of probabilistic phonological pronunciation rules from the Italian CLIPS corpus

A blending of phonological concepts and technical analysis is proposed to yield a better modeling and understanding of phonological processes. Based on the manual segmentation and labeling of the Italian CLIPS corpus we automatically derive a probabilistic set of phonological pronunciation rules: a new alignment technique is used to map the phonological form of spontaneous sentences onto the phonetic surface form. A machine-learning algorithm then calculates a set of phonological replacement rules together with their conditional probabilities. A critical analysis of the resulting probabilistic rule set is presented and discussed with regard to regional Italian accents. The rule set presented here is also applied in the newly published web-service WebMAUS that allows a user to segment and phonetically label Italian speech via a simple web-interface. Index Terms: Italian, CLIPS, pronunciation, machinelearning, dialect, MAUS

[1]  Daniel Jurafsky,et al.  Learning Phonological Rule Probabilities from Speech Corpora with Exploratory Computational Phonology , 1995, ACL.

[2]  M. Maiden,et al.  The Dialects of Italy , 1997 .

[3]  Renata Savy,et al.  CLIPS: diatopic, diamesic and diaphasic variations of spoken Italian , 2009 .

[4]  A. J. Fourcin,et al.  Levels of labelling , 1992 .

[5]  Klaus J. Kohler,et al.  Labelled data bank of spoken standard German: the Kiel corpus of read/spontaneous speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Anne H. Anderson,et al.  The Hcrc Map Task Corpus , 1991 .

[7]  Pier Marco Bertinetto,et al.  The sound pattern of Standard Italian, as compared with the varieties spoken in Florence, Milan and Rome , 2005, Journal of the International Phonetic Association.

[8]  Mary Stevens A Phonetic Investigation into «Raddoppiamento sintattico» in Sienese Italian , 2012 .

[9]  Jonathan Harrington,et al.  Phonemic Segmentation and Labelling using the MAUS Technique , 2011 .

[10]  Paul Boersma,et al.  Spreading in functional phonology , 1998 .

[11]  Florian Schiel,et al.  Automatic Phonetic Transcription of Non-Prompted Speech , 1999 .

[12]  Uwe D. Reichel,et al.  PermA and Balloon: Tools for string alignment and text processing , 2012, INTERSPEECH.

[13]  C. Dalcher Consonant weakening in Florentine Italian: A cross-disciplinary approach to gradient and variable sound change , 2008, Language Variation and Change.

[14]  Jean Carletta,et al.  HCRC dialogue structure coding manual , 1995 .