论文信息 - Speech recognition system for a service robot - a performance evaluation

Speech recognition system for a service robot - a performance evaluation

In this work we adapt and evaluate different solutions for automatic speech recognition (ASR) to be used as an HMI for the assistant robot. Two on-device solutions: Kaldi (DNN-HMM) and Mozilla's DeepSpeech (end-to-end), and three internet service APIs: IBM Watson, Microsoft Azure and Google Speech to Text are evaluated. The systems are adapted to the domain of robot commands and evaluated on a set of expected inputs. As the goal is to retain the ability to recognise general language, the systems are also evaluated on out of domain data.

[1] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[2] John J. Godfrey,et al. SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[5] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[6] David Miller,et al. The Fisher Corpus: a Resource for the Next Generations of Speech-to-Text , 2004, LREC.

[7] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).