Evaluating phonemic transcription of low-resource tonal languages for language documentation

Transcribing speech is an important part of language documentation, yet speech recognition technology has not been widely harnessed to aid linguists. We explore the use of a neural network architecture with the connectionist temporal classification loss function for phonemic and tonal transcription in a language documentation setting. In this framework, we explore jointly modelling phonemes and tones versus modelling them separately, and assess the importance of pitch information versus phonemic context for tonal prediction. Experiments on two tonal languages, Yongning Na and Eastern Chatino, show the changes in recognition performance as training data is scaled from 10 minutes up to 50 minutes for Chatino, and up to 224 minutes for Na. We discuss the findings from incorporating this technology into the linguistic workflow for documenting Yongning Na, which show the method’s promise in improving efficiency, minimizing typographical errors, and maintaining the transcription’s faithfulness to the acoustic signal, while highlighting phonetic and phonemic facts for linguistic consideration.

[1]  Anthony C. Woodbury,et al.  Finding a way into a family of tone languages: The story and methods of the Chatino Language Documentation Project , 2014 .

[2]  Pascale Fung,et al.  Cross-Lingual Language Modeling for Low-Resource Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[4]  Ulrike Mosel,et al.  Chapter 1 Language documentation: What is it and what is it good for? , 2006 .

[5]  Oliver Niebuhr,et al.  Speech Data Acquisition - The Underestimated Challenge , 2015 .

[6]  Alexis Michaud,et al.  Towards the automatic processing of Yongning Na (sino-tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages , 2014, SLTU.

[7]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[8]  Marc Brunelle,et al.  Effects of lexical frequency and lexical category on the duration of Vietnamese syllables , 2015, ICPhS.

[9]  Kai Feng,et al.  Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Laurent Besacier,et al.  Automatic Speech Recognition for Under-Resourced Languages: Application to Vietnamese Language , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Juha Karhunen,et al.  Bidirectional Recurrent Neural Networks as Generative Models , 2015, NIPS.

[12]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[13]  Ning Zhou,et al.  Sine-wave speech recognition in a tonal language. , 2012, The Journal of the Acoustical Society of America.

[14]  Emiliana Cruz,et al.  El sandhi de los tonos en el Chatino de Quiahije , 2005 .

[15]  Florian Metze,et al.  Models of tone for tonal and non-tonal languages , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[16]  Brian Kingsbury,et al.  New types of deep neural network learning for speech recognition and related applications: an overview , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Haizhou Li,et al.  Semi-Supervised and Cross-Lingual Knowledge Transfer Learnings for DNN Hybrid Acoustic Models Under Low-Resource Conditions , 2016, INTERSPEECH.

[18]  John B. Lowe,et al.  Linguistic documents synchronizing sound and text , 2001, Speech Commun..

[19]  Damir Cavar,et al.  Chatino Speech Corpus Archive Dataset , 2016 .

[20]  Nick Thieberger,et al.  Documentary Linguistics: Methodological Challenges and Innovatory Responses. , 2016 .

[21]  Tan Lee,et al.  Using tone information in Cantonese continuous speech recognition , 2002, TALIP.

[22]  Graham Neubig,et al.  Phonemic Transcription of Low-Resource Tonal Languages , 2017, ALTA.

[23]  Ngoc Thang Vu,et al.  Multilingual deep neural network based acoustic modeling for rapid language adaptation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).