Enabling Interactive Transcription in an Indigenous Community

We propose a novel transcription workflow which combines spoken term detection and human-in-the-loop, together with a pilot experiment. This work is grounded in an almost zero-resource scenario where only a few terms have so far been identified, involving two endangered languages. We show that in the early stages of transcription, when the available data is insufficient to train a robust ASR system, it is possible to take advantage of the transcription of a small number of isolated words in order to bootstrap the transcription of a speech collection.

[1]  Sebastian Stüker,et al.  A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments , 2017, LREC.

[2]  Steven Bird,et al.  Sparse Transcription , 2021, Computational Linguistics.

[3]  Thomas Niesler,et al.  Fast ASR-free and almost zero-resource keyword spotting using DTW and CNNs for humanitarian monitoring , 2018, INTERSPEECH.

[4]  Mat Bettinson,et al.  Developing a Suite of Mobile Applications for Collaborative Language Documentation , 2017 .

[5]  Scott Heath,et al.  Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS) , 2018, SLTU.

[6]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[7]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[8]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[9]  Thomas Niesler,et al.  ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages , 2018, SLTU.

[10]  Aren Jansen,et al.  Efficient spoken term discovery using randomized algorithms , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[11]  Florian Schiel,et al.  Multilingual processing of speech via web services , 2017, Comput. Speech Lang..

[12]  Vishwa Gupta,et al.  Speech Transcription Challenges for Resource Constrained Indigenous Language Cree , 2020, SLTU.

[13]  Aren Jansen,et al.  Unsupervised neural network based feature extraction using weak top-down constraints , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  James R. Glass,et al.  Towards unsupervised pattern discovery in speech , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[15]  Ole Morten Strand,et al.  Cepstral mean and variance normalization in the model domain , 2004 .

[16]  Scott Heath,et al.  Building Speech Recognition Systems for Language Documentation: The CoEDL Endangered Language Pipeline and Inference System (ELPIS) , 2018, SLTU.

[17]  Caren Brinckmann,et al.  Transcription bottleneck of speech corpus exploitation , 2008 .

[18]  Thomas Niesler,et al.  Feature Exploration for Almost Zero-Resource ASR-Free Keyword Spotting Using a Multilingual Bottleneck Extractor and Correspondence Autoencoders , 2018, INTERSPEECH.

[19]  Vishwa Gupta,et al.  Automatic Transcription Challenges for Inuktitut, a Low-Resource Polysynthetic Language , 2020, LREC.