Assessing agreement level between forced alignment models with data from endangered language documentation corpora

Automatic forced alignment between transcriptions has achieved high levels of agreement for languages with large corpora, but the technique holds great promise for work on all languages. Here, we apply two forced alignment programs to data from an endangered Mixtecan language of Mexico. Both yielded a majority of boundaries within 20 ms of hand-labeled ones. Phonemes with fairly steady-state elements (e.g. nasals, fricatives) were more accurately labeled than others. Forced alignment thus may increase efficiency of labeling texts from smaller languages, at least in cases where the phoneme inventories are similar to those of the languages of the training.

[1]  Paul Foulkes,et al.  The social life of phonetics and phonology , 2006, J. Phonetics.

[2]  Mark Liberman,et al.  Speaker identification on the SCOTUS corpus , 2008 .

[3]  Gary Simons,et al.  The Open Language Archives Community: An Infrastructure for Distributed Archiving of Language Resources , 2003, Lit. Linguistic Comput..

[4]  Ailbhe Ní Chasaide,et al.  Speech technology for minority languages: the case of Irish (gaelic) , 2006, INTERSPEECH.

[5]  Keikichi Hirose,et al.  Temporal rate change of dialogue speech in prosodic units as compared to read speech , 2002, Speech Commun..

[6]  Jyh-Shing Roger Jang,et al.  Automatic Segmentation and Labeling for Mandarin Chinese Speech Corpora for Concatenation-based TTS , 2005, ROCLING/IJCLCLP.

[7]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[8]  Daniel Jurafsky,et al.  Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates , 2010, Speech Commun..

[9]  Thierry Dutoit,et al.  Phonetic alignment: speech synthesis based vs. hybrid HMM/ANN , 1998, ICSLP.

[10]  Marie K. Huffman Segmental and prosodic effects on coda glottalization , 2005, J. Phonetics.

[11]  Thierry Dutoit,et al.  Phonetic alignment: speech synthesis-based vs. Viterbi-based , 2003, Speech Commun..

[12]  John-Paul Hosom,et al.  Speaker-independent phoneme alignment using transition-dependent states , 2009, Speech Commun..

[13]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[14]  Martine Adda-Decker,et al.  Quantifying temporal speech reduction in French using forced speech alignment , 2011, J. Phonetics.

[15]  Mark Liberman,et al.  Investigating /l/ variation in English through forced alignment , 2009, INTERSPEECH.

[16]  R L Diehl,et al.  On the Role of Perception in Shaping Phonological Assimilation Rules , 1992, Language and speech.

[17]  Justus C. Roux,et al.  Data-driven approach to rapid prototyping Xhosa speech synthesis , 2007, SSW.

[18]  J. Ohala Papers in Laboratory Phonology: The phonetics and phonology of aspects of assimilation , 1990 .

[19]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[20]  H. Timothy Bunnell,et al.  Automatic personal synthetic voice construction , 2005, INTERSPEECH.

[21]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[22]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[23]  Gitta P. M. Laan The contribution of intonation, segmental durations, and spectral features to the perception of a spontaneous and a read speaking style , 1997, Speech Commun..