论文信息 - Homophone Identification and Merging for Code-switched Speech Recognition

Homophone Identification and Merging for Code-switched Speech Recognition

Code-switching or mixing is the use of multiple languages in a single utterance or conversation. Borrowing occurs when a word from a foreign language becomes part of the vocabulary of a language. In multilingual societies, switching/mixing and borrowing are not always clearly distinguishable. Due to this, transcription of code-switched and borrowed words is often not standardized, and leads to the presence of homophones in the training data. In this work, we automatically identify and disambiguate homophones in code-switched data to improve recognition of code-switched speech. We use a WX-based common pronunciation scheme for both languages being mixed and unify the homophones during training, which results in a lower word error rate for systems built using this data. We also extend this framework to propose a metric for code-switched speech recognition that takes into account homophones in both languages while calculating WER, which can help provide a more accurate picture of errors the ASR system makes on code-switched speech.

Sunayana Sitaram | Brij Mohan Lal Srivastava | B. M. L. Srivastava | Sunayana Sitaram

[1] Amitava Das,et al. Comparing the Level of Code-Switching in Corpora , 2016, LREC.

[2] Mark Hasegawa-Johnson,et al. Acquiring Speech Transcriptions Using Mismatched Crowdsourcing , 2015, AAAI.

[3] Suryakanth V. Gangashetty,et al. Adapting monolingual resources for code-mixed hindi-english speech recognition , 2017, 2017 International Conference on Asian Language Processing (IALP).

[4] Walid Magdy,et al. Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR , 2015, ANLP@ACL.

[5] Amit Agarwal,et al. CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.

[6] Rohit Gupta,et al. Transliteration among Indian Languages using WX Notation , 2010, KONVENS.

[7] Jatin Sharma,et al. “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[8] Jeffrey Heath. Language Contact and Language Change , 1984 .

[9] Su-Youn Yoon,et al. A Python Toolkit for Universal Transliteration , 2010, LREC.

[10] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[11] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[12] Preslav Nakov,et al. WERD: Using social text spelling variants for evaluating dialectal speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[13] Tanja Schultz,et al. Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..