论文信息 - Constructing speech processing systems on universal phonetic codes accompanied with reference acoustic models

Constructing speech processing systems on universal phonetic codes accompanied with reference acoustic models

This paper proposes a novel speech processing framework, where all of the speech data are once encoded into universal phonetic code (UPC) sequences and speech processing systems, such as speech recognition, retrieval, digesting, are constructed on this UPC domain. First of all, we introduce an IPA-based sub-phonetic segment (SPS) set as the UPC to deal with multilingual speech. In the UPC (SPS) domain, each UPC accompanies a reference acoustic model which is independent of real acoustic models used in the encoding process. Processing, such as recognition, in the UPC domain is conducted based on the distance between UPC sequences estimated by using the reference acoustic models. We confirm the proposed framework by constructing a speech recognition and a vocabulary-free speech retrieval system on the SPS domain. We show several experimental results on these systems, using Japanese and English speech data sets.

Kazuyo Tanaka | Yoshiaki Itoh | Hiroaki Kojima | Nahoko Fujimura

[1] Kazuyo Tanaka,et al. A method of extracting time-varying acoustic features effective for speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] Roberto Gemello,et al. Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition , 2000, Inf. Sci..

[3] Kazuyo Tanaka,et al. Automatic labeling and digesting for lecture speech utilizing repeated speech by shift CDP , 2001, INTERSPEECH.

[4] Karen Spärck Jones,et al. Unconstrained keyword spotting using phone lattices with application to spoken document retrieval , 1997, Comput. Speech Lang..

[5] Kazuyo Tanaka,et al. A demiphoneme network representation of speech and automatic labeling techniques for speech data base construction , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Kazuyo Tanaka,et al. Between-word distance calculation in a symbolic domain and its applications to speech recognition , 2000, Inf. Sci..

[7] Kazuyo Tanaka,et al. A speech recognition method with a language-independent intermediate phonetic code , 2000, INTERSPEECH.