论文信息 - Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

Spoken Term Detection Methods for Sparse Transcription in Very Low-resource Settings

We investigate the efficiency of two very different spoken term detection approaches for transcription when the available data is insufficient to train a robust ASR system. This work is grounded in very low-resource language documentation scenario where only few minutes of recording have been transcribed for a given language so far. Experiments on two oral languages show that a pretrained universal phone recognizer, fine-tuned with only a few minutes of target language speech, can be used for spoken term detection with a better overall performance than a dynamic time warping approach. In addition, we show that representing phoneme recognition ambiguity in a graph structure can further boost the recall while maintaining high precision in the low resource spoken term detection task.

Laurent Besacier | Éric Le Ferrand | Steven Bird

[1] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[2] Karen Livescu,et al. Discriminative acoustic word embeddings: Tecurrent neural network-based approaches , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[3] Aren Jansen,et al. Unsupervised neural network based feature extraction using weak top-down constraints , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] C. Cieri,et al. Evaluating phonemic transcription of low-resource tonal languages for language documentation , 2018 .

[5] Graham Neubig,et al. Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit , 2018 .

[6] Florian Schiel,et al. Multilingual processing of speech via web services , 2017, Comput. Speech Lang..

[7] Sebastian Stüker,et al. A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments , 2017, LREC.

[8] Thomas Niesler,et al. Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders , 2018, INTERSPEECH.

[9] Shinji Watanabe,et al. ESPnet: End-to-End Speech Processing Toolkit , 2018, INTERSPEECH.

[10] Paul Felt,et al. Improving the Effectiveness of Machine-Assisted Annotation , 2012 .

[11] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[12] Karen Livescu,et al. Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings , 2017, INTERSPEECH.

[13] Caren Brinckmann,et al. Transcription bottleneck of speech corpus exploitation , 2008 .

[14] Laurent Besacier,et al. Enabling Interactive Transcription in an Indigenous Community , 2020, COLING.

[15] Steven Bird,et al. Sparse Transcription , 2021, Computational Linguistics.

[16] Vishwa Gupta,et al. Speech Transcription Challenges for Resource Constrained Indigenous Language Cree , 2020, SLTU.

[17] Scott Heath,et al. Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS) , 2018, SLTU.

[18] Vishwa Gupta,et al. Automatic Transcription Challenges for Inuktitut, a Low-Resource Polysynthetic Language , 2020, LREC.

[19] Alan W Black,et al. Universal Phone Recognition with a Multilingual Allophone System , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20] Florian Metze,et al. AlloVera: A Multilingual Allophone Database , 2020, LREC.

[21] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[22] Christopher Cox,et al. User-friendly Automatic Transcription of Low-resource Languages: Plugging ESPnet into Elpis , 2020, COMPUTEL.