Linguistically Informed Post-processing for ASR Error correction in Sanskrit

We propose an ASR system for Sanskrit, a low-resource language, that effectively combines subword tokenisation strategies and search space enrichment with linguistic information. More specifically, to address the challenges due to the high degree of out-of-vocabulary entries present in the language, we first use a subword-based language model and acoustic model to generate a search space. The search space, so obtained, is converted into a word-based search space and is further enriched with morphological and lexical information based on a shallow parser. Finally, the transitions in the search space are rescored using a supervised morphological parser proposed for Sanskrit. Our proposed approach currently reports the state-of-the-art results in Sanskrit ASR, with a 7.18 absolute point reduction in WER than the previous state-of-the-art.

[1]  A. Ramakrishnan,et al.  CTC-Based End-To-End ASR for the Low Resource Sanskrit Language with Spectrogram Augmentation , 2021, National Conference on Communications.

[2]  Ganesh Ramakrishnan,et al.  Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights , 2021, FINDINGS.

[3]  Ashim Gupta,et al.  A Graph-Based Framework for Structured Prediction Tasks in Sanskrit , 2020, Computational Linguistics.

[4]  Chen Liu,et al.  Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding , 2020, INTERSPEECH.

[5]  Kyunghyun Cho,et al.  Neural Machine Translation with Byte-Level Subwords , 2019, AAAI.

[6]  A. G. Ramakrishnan,et al.  Automatic Speech Recognition for Sanskrit , 2019, 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT).

[7]  Tara N. Sainath,et al.  Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Pawan Goyal,et al.  Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit , 2018, EMNLP.

[9]  Pawan Goyal,et al.  Design and analysis of a lean interface for Sanskrit corpus annotation , 2016, J. Lang. Model..

[10]  Ariya Rastrow,et al.  LatticeRnn: Recurrent Neural Networks Over Lattices , 2016, INTERSPEECH.

[11]  Alexandra Birch,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[12]  Ruhi Sarikaya,et al.  Hypotheses ranking for robust domain classification and tracking in dialogue systems , 2014, INTERSPEECH.

[13]  George Saon,et al.  Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14]  Gökhan Tür,et al.  Semantic parsing using word confusion networks with conditional random fields , 2013, INTERSPEECH.

[15]  Matthew Henderson,et al.  Discriminative spoken language understanding using word confusion networks , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[16]  Amba Kulkarni,et al.  Discourse Analysis of Sanskrit texts , 2012 .

[17]  Mark J. F. Gales,et al.  Morphological decomposition in Arabic ASR systems , 2012, Comput. Speech Lang..

[18]  Malcolm D. Hyman,et al.  Linguistic Issues in Encoding Sanskrit , 2012 .

[19]  Gökhan Tür,et al.  Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[20]  Gérard P. Huet,et al.  Under Consideration for Publication in J. Functional Programming a Functional Toolkit for Morphological and Phonological Processing, Application to a Sanskrit Tagger , 2022 .

[21]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[22]  Andreas Stolcke,et al.  Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[23]  Petra Geutner,et al.  Using morphology towards better large-vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[24]  Chin-Hui Lee,et al.  Large vocabulary speech recognition using subword units , 1993, Speech Commun..

[25]  A.-M. Derouault,et al.  A morphological model for large vocabulary speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26]  A. G. Ramakrishnan,et al.  Investigation of Different G2P Schemes for Speech Recognition in Sanskrit , 2021, ICONIP.

[27]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[28]  Hermann Ney,et al.  Compound Word Recombination for German LVCSR , 2011, INTERSPEECH.

[29]  Jean-Luc Gauvain,et al.  Investigating morphological decomposition for transcription of Arabic broadcast news and broadcast conversation data , 2008, INTERSPEECH.