Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System

This paper outlines the recent development on multilingual medical data and multilingual speech recognition system for network-based speech-to-speech translation in the medical domain. The overall speech-to-speech translation (S2ST) system was designed to translate spoken utterances from a given source language into a target language in order to facilitate multilingual conversations and reduce the problems caused by language barriers in medical situations. Our final system utilizes a weighted finite-state transducers with n-gram language models. Currently, the system successfully covers three languages: Japanese, English, and Chinese. The difficulties involved in connecting Japanese, English and Chinese speech recognition systems through Web servers will be discussed, and the experimental results in simulated medical conversation will also be presented.

[1]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[2]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[3]  Hitoshi Isahara,et al.  Spontaneous Speech Corpus of Japanese , 2000, LREC.

[4]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Tomoki Toda,et al.  Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors , 2013, INTERSPEECH.

[6]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[7]  Eiichiro Sumita,et al.  Comparative study on corpora for speech translation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Graham Neubig,et al.  Pointwise Prediction for Robust, Adaptable Japanese Morphological Analysis , 2011, ACL.

[9]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[11]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Yuji Matsumoto,et al.  Towards High-Reliability Speech Translation in the Medical Domain , 2013, NLPHealthcare@IJCNLP.

[13]  Fernando Pereira,et al.  Weighted finite-state transducers in speech recognition , 2002, Comput. Speech Lang..

[14]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[15]  Byunggon Yang Phoneme distribution and syllable structure of entry words in the Carnegie Mellon University Pronouncing Dictionary , 2016 .