SPONTANEOUS SPEECH RECOGNITION FOR ROMANIAN IN SPOKEN DIALOGUE SYSTEMS

Laboratoire Informatique d’Avignon, University of Avignon, France Corresponding author: Corneliu BURILEANU, E-mail: cburileanu@messnet.pub.ro In this paper we present an attempt to develop a speech recognition module for the Romanian language in order to be used in a dialogue system. The main characteristics of such a dialogue system are first discussed. Further, we explain the design and acquisition of a spontaneous speech database for training the decoder: the design guidelines in developing the database, as well as several practical issues encountered, along with some triphones balancing statistics are pointed out. Then, the speech recognition architecture (based on components in the “Hidden Markov Modeling Toolkit” – HTK) is described in detail, emphasizing the two aspects, training and decoding. In the next section, a discussion of several preliminary recognition results is provided, emphasizing current limitations and the need to significantly increase the size of the database. A set of conclusions and perspectives are offered at the end of the paper. Key words: Continuous speech recognition; Speech database; Hidden Markov Modeling.

[1]  Alex Acero,et al.  Spoken Language Processing , 2001 .

[2]  Tanja Schultz,et al.  Multilingual Speech Processing , 2006 .

[3]  Ngoc-Hoá Nguyen Dialogue Homme-Machine : Modélisation de multisession , 2005 .

[4]  Corneliu Burileanu,et al.  Parallel training algorithms for continuous speech recognition, implemented in a message passing framework , 2006, 2006 14th European Signal Processing Conference.

[5]  Viet Bac Le Reconnaissance automatique de la parole pour des langues peu dotées. (Automatic Speech Recognition for Under-Ressourced Languages) , 2006 .

[6]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[7]  篠田 浩一 私のすすめるこの一冊 ; Spoken Launguage Processing: A Guide to Theory, Algorithm, and System Development, Xuedong Huang, Alex Acero and Hsiao-Wuen Hon, Prentice Hall, 2001 年 , 2003 .

[8]  Steve Renals,et al.  WSJCAMO: a British English speech corpus for large vocabulary continuous speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Stephen E. Levinson,et al.  Mathematical Models for Speech Technology , 2005 .

[10]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[11]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[12]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[13]  Michael F. McTear,et al.  Book Review: Spoken Dialogue Technology: Toward the Conversational User Interface, by Michael F. McTear , 2002, CL.