Homophone Identification and Merging for Code-switched Speech Recognition

Code-switching or mixing is the use of multiple languages in a single utterance or conversation. Borrowing occurs when a word from a foreign language becomes part of the vocabulary of a language. In multilingual societies, switching/mixing and borrowing are not always clearly distinguishable. Due to this, transcription of code-switched and borrowed words is often not standardized, and leads to the presence of homophones in the training data. In this work, we automatically identify and disambiguate homophones in code-switched data to improve recognition of code-switched speech. We use a WX-based common pronunciation scheme for both languages being mixed and unify the homophones during training, which results in a lower word error rate for systems built using this data. We also extend this framework to propose a metric for code-switched speech recognition that takes into account homophones in both languages while calculating WER, which can help provide a more accurate picture of errors the ASR system makes on code-switched speech.

[1]  Amitava Das,et al.  Comparing the Level of Code-Switching in Corpora , 2016, LREC.

[2]  Mark Hasegawa-Johnson,et al.  Acquiring Speech Transcriptions Using Mismatched Crowdsourcing , 2015, AAAI.

[3]  Suryakanth V. Gangashetty,et al.  Adapting monolingual resources for code-mixed hindi-english speech recognition , 2017, 2017 International Conference on Asian Language Processing (IALP).

[4]  Walid Magdy,et al.  Multi-Reference Evaluation for Dialectal Speech Recognition System: A Study for Egyptian ASR , 2015, ANLP@ACL.

[5]  Amit Agarwal,et al.  CNTK: Microsoft's Open-Source Deep-Learning Toolkit , 2016, KDD.

[6]  Rohit Gupta,et al.  Transliteration among Indian Languages using WX Notation , 2010, KONVENS.

[7]  Jatin Sharma,et al.  “I am borrowing ya mixing ?" An Analysis of English-Hindi Code Mixing in Facebook , 2014, CodeSwitch@EMNLP.

[8]  Jeffrey Heath Language Contact and Language Change , 1984 .

[9]  Su-Youn Yoon,et al.  A Python Toolkit for Universal Transliteration , 2010, LREC.

[10]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[11]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[12]  Preslav Nakov,et al.  WERD: Using social text spelling variants for evaluating dialectal speech recognition , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[13]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..