论文信息 - Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech Recognition

Use of Knowledge Graph in Rescoring the N-Best List in Automatic Speech Recognition

With the evolution of neural network based methods, automatic speech recognition (ASR) field has been advanced to a level where building an application with speech interface is a reality. In spite of these advances, building a real-time speech recogniser faces several problems such as low recognition accuracy, domain constraint, and out-of-vocabulary words. The low recognition accuracy problem is addressed by improving the acoustic model, language model, decoder and by rescoring the N-best list at the output of the decoder. We are considering the N-best list rescoring approach to improve the recognition accuracy. Most of the methods in the literature use the grammatical, lexical, syntactic and semantic connection between the words in a recognised sentence as a feature to rescore. In this paper, we have tried to see the semantic relatedness between the words in a sentence to rescore the N-best list. Semantic relatedness is computed using TransE~\cite{bordes2013translating}, a method for low dimensional embedding of a triple in a knowledge graph. The novelty of the paper is the application of semantic web to automatic speech recognition.

Maria-Esther Vidal | Sören Auer | Christoph Schmidt | Camilo Morales | Ashwini Jaya Kumar

[1] Jinyu Li,et al. A study on knowledge source integration for candidate rescoring in automatic speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2] Vassilios Digalakis,et al. Combining Knowledge Sources to Reorder N-Best Speech Hypothesis Lists , 1994, HLT.

[3] Mehryar Mohri,et al. Speech Recognition with Weighted Finite-State Transducers , 2008 .

[4] Mitch Weintraub,et al. Explicit word error minimization in n-best list rescoring , 1997, EUROSPEECH.

[5] Pablo N. Mendes,et al. Improving efficiency and accuracy in multilingual entity extraction , 2013, I-SEMANTICS '13.

[6] Jason Weston,et al. Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[7] Ngoc Thang Vu,et al. Generating exact lattices in the WFST framework , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Wen Wang,et al. N-Best Rescoring Based on Pitch-accent Patterns , 2011, ACL.

[9] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[10] Fuchun Peng,et al. Search results based N-best hypothesis rescoring with maximum entropy classification , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.