论文信息 - Towards phoneme inventory discovery for documentation of unwritten languages

Towards phoneme inventory discovery for documentation of unwritten languages

Documenting unwritten languages is a challenging task, even for trained specialists. To help linguists in better and faster documenting new languages is the goal of the French-German ANR-DFG project BULB. To discover the phonetic inventory of a language the project follows three steps: estimating phoneme boundaries, classifying articulatory features (AFs) for each individual segment and clustering the segments into a phoneme inventory. In this work, we focus on estimating the phoneme boundaries and the extraction of AFs, but also perform a first simple clustering based on the recognized AFs. We demonstrate that our Deep Bidirectional LSTM-based approach for identifying phoneme boundaries achieves state-of-the-art performance and evaluate AF extraction based on feed forward neural networks.

Jörg Franke | Sebastian Stüker | Alexander H. Waibel | Markus Müller

[1] Steve Renals,et al. Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[2] Alan W. Black,et al. Automatic discovery of a phonetic inventory for unwritten languages for statistical speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3] Satoshi Nakamura,et al. Unsupervised Linear Discriminant Analysis for Supporting DPGMM Clustering in the Zero Resource Scenario , 2016, SLTU.

[4] Keiko Horiguchi,et al. Towards Spontaneous Speech Translation , 1994 .

[5] Sanjeev Khudanpur,et al. Unsupervised Learning of Acoustic Sub-word Units , 2008, ACL.

[6] Florian Metze,et al. Articulatory features for conversational speech recognition , 2005 .

[7] Satoshi Nakamura,et al. Unsupervised Phoneme Segmentation of Previously Unseen Languages , 2016, INTERSPEECH.

[8] Khe Chai Sim,et al. An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Odette Scharenborg,et al. Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. , 2010, The Journal of the Acoustical Society of America.

[10] Markus Müller,et al. Using language adaptive deep neural networks for improved multilingual speech recognition , 2015, IWSLT.

[11] Florian Metze,et al. A flexible stream architecture for ASR using articulatory features , 2002, INTERSPEECH.

[12] Tanja Schultz,et al. Multilingual articulatory features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13] Jörg Franke,et al. Phoneme Boundary Detection using Deep Bidirectional LSTMs , 2016, ITG Symposium on Speech Communication.

[14] A. Waibel,et al. A one-pass decoder based on polymorphic linguistic context assignment , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[15] Tanja Schultz,et al. Integrating multilingual articulatory features into speech recognition , 2003, INTERSPEECH.

[16] Finn Dag Buø,et al. JANUS 93: towards spontaneous speech translation , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[17] Thomas Pellegrini,et al. Inferring Phonemic Classes from CNN Activation Maps Using Clustering Techniques , 2016, INTERSPEECH.

[18] A. Waibel,et al. Towards Improving Low-Resource Speech Recognition Using Articulatory and Language Features , 2016, IWSLT.

[19] Sebastian Stüker,et al. Innovative technologies for under-resourced language documentation: The BULB Project , 2016 .

[20] Roberto Gretter. Euronews: a multilingual benchmark for ASR and LID , 2014, INTERSPEECH.

[21] Aren Jansen,et al. A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge , 2015, INTERSPEECH.

[22] Sebastian Stüker,et al. Language Adaptive DNNs for Improved Low Resource Speech Recognition , 2016, INTERSPEECH.

[23] Sebastian Stüker,et al. Language Feature Vectors for Resource Constraint Speech Recognition , 2016, ITG Symposium on Speech Communication.

[24] Aren Jansen,et al. The zero resource speech challenge 2017 , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).