论文信息 - Linguistically Informed Post-processing for ASR Error correction in Sanskrit

Linguistically Informed Post-processing for ASR Error correction in Sanskrit

We propose an ASR system for Sanskrit, a low-resource language, that effectively combines subword tokenisation strategies and search space enrichment with linguistic information. More specifically, to address the challenges due to the high degree of out-of-vocabulary entries present in the language, we first use a subword-based language model and acoustic model to generate a search space. The search space, so obtained, is converted into a word-based search space and is further enriched with morphological and lexical information based on a shallow parser. Finally, the transitions in the search space are rescored using a supervised morphological parser proposed for Sanskrit. Our proposed approach currently reports the state-of-the-art results in Sanskrit ASR, with a 7.18 absolute point reduction in WER than the previous state-of-the-art.

[1] A. Ramakrishnan,et al. CTC-Based End-To-End ASR for the Low Resource Sanskrit Language with Spectrogram Augmentation , 2021, National Conference on Communications.

[2] Ganesh Ramakrishnan,et al. Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights , 2021, FINDINGS.

[3] Ashim Gupta,et al. A Graph-Based Framework for Structured Prediction Tasks in Sanskrit , 2020, Computational Linguistics.

[4] Chen Liu,et al. Jointly Encoding Word Confusion Network and Dialogue Context with BERT for Spoken Language Understanding , 2020, INTERSPEECH.

[5] Kyunghyun Cho,et al. Neural Machine Translation with Byte-Level Subwords , 2019, AAAI.

[6] A. G. Ramakrishnan,et al. Automatic Speech Recognition for Sanskrit , 2019, 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT).

[7] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Pawan Goyal,et al. Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit , 2018, EMNLP.

[9] Pawan Goyal,et al. Design and analysis of a lean interface for Sanskrit corpus annotation , 2016, J. Lang. Model..

[10] Ariya Rastrow,et al. LatticeRnn: Recurrent Neural Networks Over Lattices , 2016, INTERSPEECH.

[11] Alexandra Birch,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[12] Ruhi Sarikaya,et al. Hypotheses ranking for robust domain classification and tracking in dialogue systems , 2014, INTERSPEECH.

[13] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[14] Gökhan Tür,et al. Semantic parsing using word confusion networks with conditional random fields , 2013, INTERSPEECH.

[15] Matthew Henderson,et al. Discriminative spoken language understanding using word confusion networks , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[16] Amba Kulkarni,et al. Discourse Analysis of Sanskrit texts , 2012 .

[17] Mark J. F. Gales,et al. Morphological decomposition in Arabic ASR systems , 2012, Comput. Speech Lang..

[18] Malcolm D. Hyman,et al. Linguistic Issues in Encoding Sanskrit , 2012 .

[19] Gökhan Tür,et al. Beyond ASR 1-best: Using word confusion networks in spoken language understanding , 2006, Comput. Speech Lang..

[20] Gérard P. Huet,et al. Under Consideration for Publication in J. Functional Programming a Functional Toolkit for Morphological and Phonological Processing, Application to a Sanskrit Tagger , 2022 .

[21] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[22] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[23] Petra Geutner,et al. Using morphology towards better large-vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[24] Chin-Hui Lee,et al. Large vocabulary speech recognition using subword units , 1993, Speech Commun..

[25] A.-M. Derouault,et al. A morphological model for large vocabulary speech recognition , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26] A. G. Ramakrishnan,et al. Investigation of Different G2P Schemes for Speech Recognition in Sanskrit , 2021, ICONIP.

[27] Sanjeev Khudanpur,et al. A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[28] Hermann Ney,et al. Compound Word Recombination for German LVCSR , 2011, INTERSPEECH.

[29] Jean-Luc Gauvain,et al. Investigating morphological decomposition for transcription of Arabic broadcast news and broadcast conversation data , 2008, INTERSPEECH.