Recent work on a preselection module for a flexible large vocabulary speech recognition system in telephone environment

At ICSLP’96 we presented a flexible, large vocabulary, speaker independent, isolated-word preselection system in a telephone environment, using a two stage, bottom-up strategy [6]. We achieved reasonable performance in large and very large vocabulary tasks, ranging from 1200 to 10000 words. In this paper, we describe recent studies we have carried out on the system, aimed at two directions: handling of non speech sounds in the speech signal (we consider lips, respiration and click noises); and making the preselection lists dynamic in length, to reduce computational load, in the average. In the first case, we want to model non speech sounds, as these effects are crucial in real-life situations, leading to wrong endpointing and increasing error rates. In the second, we are interested in integrating any available system parameter to calculate the preselection list length to use, having applied both parametric and non parametric methods.

[1]  Javier Macías Guarasa,et al.  Initial evaluation of a preselection module for a flexible large vocabulary speech recognition system in telephone environment , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[2]  Pietro Laface,et al.  Lexical access to large vocabularies for speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[3]  Luis A. Hernández Gómez,et al.  Context-dependent units for vocabulary-independent Spanish speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[4]  Javier Macías Guarasa,et al.  Comparison of three approaches to phonetic string generation for large vocabulary speech recognition , 1994, ICSLP.

[5]  Alex Acero,et al.  The VESTEL telephone speech database , 1994, ICSLP.

[6]  Javier Macías Guarasa,et al.  On the development of a dictation machine for Spanish: DIVO , 1994, ICSLP.