Evaluation of crowdsourcing transcriptions for African languages

We evaluate the quality of speech transcriptions acquired by crowdsourcing to develop ASR acoustic models (AM) for under-resourced languages. We have developed AMs using reference (REF) transcriptions and transcriptions from crowdsourcing (TRK) for Swahili and Amharic. While the Amharic transcription was much slower than that of Swahili to complete, the speech recognition systems developed using REF and TRK transcriptions have almost similar (40.1 vs 39.6 for Amharic and 38.0 vs 38.5 for Swahili) word recognition error rate. Moreover, the character level disagreement rates between REF and TRK are only 3.3% and 6.1% for Amharic and Swahili, respectively. We conclude that it is possible to acquire quality transcriptions from the crowd for under-resourced languages using Amazon’s Mechanical Turk. Recognizing such a great potential of it, we recommend some legal and ethical issues to consider.

[1]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[2]  James R. Glass,et al.  Collecting Voices from the Cloud , 2010, LREC.

[3]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk for transcription of spoken language , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Etienne Barnard,et al.  Speech Technology for Information Access: a South African Case Study , 2010, AAAI Spring Symposium: Artificial Intelligence for Development.

[5]  Solomon Teferra Abate,et al.  An Amharic speech corpus for large vocabulary continuous speech recognition , 2005, INTERSPEECH.

[6]  Ian McGraw,et al.  A self-labeling speech corpus: collecting spoken words with an online educational game , 2009, INTERSPEECH.

[7]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[8]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk to Transcribe and Annotate Meeting Speech for Extractive Summarization , 2010, Mturk@HLT-NAACL.

[9]  Chris Callison-Burch,et al.  Cheap, Fast and Good Enough: Automatic Speech Recognition with Non-Expert Transcription , 2010, NAACL.

[10]  Ian McGraw,et al.  A self-transcribing speech corpus: collecting continuous speech with an online educational game , 2009, SLaTE.

[11]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[12]  Johan Schalkwyk,et al.  Voice search for development , 2010, INTERSPEECH.

[13]  Klaus Zechner,et al.  Using Amazon Mechanical Turk for Transcription of Non-Native Speech , 2010, Mturk@HLT-NAACL.

[14]  Ian R. Lane,et al.  Tools for Collecting Speech Corpora via Mechanical-Turk , 2010, Mturk@HLT-NAACL.

[15]  Solomon Teferra Abate,et al.  Morpheme-based automatic speech recognition for a morphologically rich language - Amharic , 2010, SLTU.