Automatic Detection of Syllable Boundaries in Spontaneous Speech

This paper presents the outline and performance of an automatic syllable boundary detection system. The syllabification of phonemes is performed with a rule-based system, implemented in a Java program. Phonemes are categorized into 6 classes. A set of specific rules are developed and categorized as general rules which can be applied in all cases, and exception rules which are applied in some specific situations. These rules deal with a French spontaneous speech corpus. Moreover, the proposed phonemes, classes and rules are listed in an external configuration file of the tool (under GPL licence) that make the tool very easy to adapt to a specific corpus by adding or modifying rules, phoneme encoding or phoneme classes, by the use of a new configuration file. Finally, performances are evaluated and compared to 3 other French syllabification systems and show significant improvements. Automatic system output and expert's syllabification are in agreement for most of syllable boundaries in our corpus.

[1]  MarchandYannick,et al.  Can syllabification improve pronunciation by analogy of English , 2007 .

[2]  Grzegorz Kondrak,et al.  On the Syllabification of Phonemes , 2009, NAACL.

[3]  R. Espesser,et al.  Le CID - Corpus of Interactional Data. Annotation et exploitation multimodale de parole conversationnelle [The “Corpus of Interactional Data” (CID) - Multimodal annotation of conversational speech”] , 2008, ICON.

[4]  Antonio Moreno-Sandoval,et al.  Developing a Phonemic and Syllabic Frequency Inventory for Spontaneous Spoken Castilian Spanish and their Comparison to Text-Based Inventories , 2008, LREC.

[5]  Jeremy Goslin,et al.  A Comparison of Theoretical and Human Syllabification , 2001, Language and speech.

[6]  Lori Lamel,et al.  Investigating syllabic structures and their variation in spontaneous French , 2005, Speech Commun..

[7]  E. Pulgram Syllable, word, nexus, cursus , 1970 .

[8]  David Crystal A dictionary of linguistics and phonetics / David Crystal , 2008 .

[9]  Robert I. Damper,et al.  Can syllabification improve pronunciation by analogy of English? , 2006, Natural Language Engineering.

[10]  Grzegorz Kondrak,et al.  Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion , 2008, ACL.

[11]  Yannick Marchand,et al.  Automatic Syllabification in English: A Comparison of Different Algorithms , 2009, Language and speech.

[12]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[13]  Philippe Blache,et al.  Creating and Exploiting Multimodal Annotated Corpora , 2008, LREC.

[14]  Marc Brysbaert,et al.  Lexique 2 : A new French lexical database , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[15]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[16]  Ulrich H. Frauenfelder,et al.  Boundaries versus Onsets in Syllabic Segmentation , 2001 .

[17]  R. Treiman,et al.  Syllabification of intervocalic consonants , 1988 .