Towards phoneme inventory discovery for documentation of unwritten languages

Documenting unwritten languages is a challenging task, even for trained specialists. To help linguists in better and faster documenting new languages is the goal of the French-German ANR-DFG project BULB. To discover the phonetic inventory of a language the project follows three steps: estimating phoneme boundaries, classifying articulatory features (AFs) for each individual segment and clustering the segments into a phoneme inventory. In this work, we focus on estimating the phoneme boundaries and the extraction of AFs, but also perform a first simple clustering based on the recognized AFs. We demonstrate that our Deep Bidirectional LSTM-based approach for identifying phoneme boundaries achieves state-of-the-art performance and evaluate AF extraction based on feed forward neural networks.

[1]  Steve Renals,et al.  Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[2]  Alan W. Black,et al.  Automatic discovery of a phonetic inventory for unwritten languages for statistical speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Satoshi Nakamura,et al.  Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario , 2016, SLTU.

[4]  Keiko Horiguchi,et al.  Towards Spontaneous Speech Translation , 1994 .

[5]  Sanjeev Khudanpur,et al.  Unsupervised Learning of Acoustic Sub-word Units , 2008, ACL.

[6]  Florian Metze,et al.  Articulatory features for conversational speech recognition , 2005 .

[7]  Satoshi Nakamura,et al.  Unsupervised Phoneme Segmentation of Previously Unseen Languages , 2016, INTERSPEECH.

[8]  Khe Chai Sim,et al.  An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Odette Scharenborg,et al.  Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. , 2010, The Journal of the Acoustical Society of America.

[10]  Markus Müller,et al.  Using language adaptive deep neural networks for improved multilingual speech recognition , 2015, IWSLT.

[11]  Florian Metze,et al.  A flexible stream architecture for ASR using articulatory features , 2002, INTERSPEECH.

[12]  Tanja Schultz,et al.  Multilingual articulatory features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Jörg Franke,et al.  Phoneme Boundary Detection using Deep Bidirectional LSTMs , 2016, ITG Symposium on Speech Communication.

[14]  A. Waibel,et al.  A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[15]  Tanja Schultz,et al.  Integrating multilingual articulatory features into speech recognition , 2003, INTERSPEECH.

[16]  Finn Dag Buø,et al.  JANUS 93: towards spontaneous speech translation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Thomas Pellegrini,et al.  Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques , 2016, INTERSPEECH.

[18]  A. Waibel,et al.  Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features , 2016, IWSLT.

[19]  Sebastian Stüker,et al.  Innovative technologies for under-resourced language documentation: The BULB Project , 2016 .

[20]  Roberto Gretter Euronews: a multilingual benchmark for ASR and LID , 2014, INTERSPEECH.

[21]  Aren Jansen,et al.  A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge , 2015, INTERSPEECH.

[22]  Sebastian Stüker,et al.  Language Adaptive DNNs for Improved Low Resource Speech Recognition , 2016, INTERSPEECH.

[23]  Sebastian Stüker,et al.  Language Feature Vectors for Resource Constraint Speech Recognition , 2016, ITG Symposium on Speech Communication.

[24]  Aren Jansen,et al.  The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).