论文信息 - Speech data retrieval system constructed on a universal phonetic code domain

Speech data retrieval system constructed on a universal phonetic code domain

We propose a novel speech processing framework, where all of the speech data are encoded into universal phonetic code (UPC) sequences and speech processing systems, such as speech recognition, retrieval, digesting, etc., are constructed on this UPC domain. As the first step, we introduce a sub-phonetic segment (SPS) set, based on IPA (international phonetic alphabet), to deal with multilingual speech and develop a procedure to estimate acoustic models of the SPS from IPA-like phone models. The key point of the framework is to employ environment adaptation into the SPS encoding stage. This makes it possible to normalize acoustic variations and extract the language factor contained in speech signals as encoded SPS sequences. We confirm these characteristics by constructing a speech retrieval system on the SPS domain. The system can retrieve key phrases, given by speech, from different environment speech data in a vocabulary-free condition. We show several preliminary experimental results on this system, using Japanese and English sentence speech sets.

Yoshiaki Itoh | K. Tanaka | H. Kojima | N. Fujimura

[1] Kazuyo Tanaka,et al. Speech recognition based on the distance calculation between intermediate phonetic code sequences in symbolic domain , 1998, ICSLP.

[2] Tetsunori Kobayashi,et al. ASJ continuous speech corpus for research , 1992 .

[3] Kazuyo Tanaka,et al. Automatic labeling and digesting for lecture speech utilizing repeated speech by shift CDP , 2001, INTERSPEECH.

[4] Kazuyo Tanaka,et al. Between-word distance calculation in a symbolic domain and its applications to speech recognition , 2000, Inf. Sci..

[5] Kazuyo Tanaka,et al. A demiphoneme network representation of speech and automatic labeling techniques for speech data base construction , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Karen Spärck Jones,et al. Unconstrained keyword spotting using phone lattices with application to spoken document retrieval , 1997, Comput. Speech Lang..

[7] Roberto Gemello,et al. Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition , 2000, Inf. Sci..

[8] Kazuyo Tanaka,et al. A speech recognition method with a language-independent intermediate phonetic code , 2000, INTERSPEECH.