Speech recognition using context conditional word posterior probabilities

In this paper two new scoring schemes for large vocabulary continuous speech recognition are compared. Instead of using the joint probability of a word sequence and a sequence of acoustic observations, we determine the best path through a word graph using posterior word probabilities with or without word context. The exact calculation of the posterior probability for a word sequence implies a sum over all possible word boundaries, which is approximated by a maximum operation in the standard scoring approach. The new scoring scheme using word posterior probabilities could be expected to lead to improved recognition performance, because it involves partial summation over word boundaries. We present experimental results on five different corpora, the Dutch Arise corpus, the German Verbmobil ’98 corpus, the English North American Business ’94 20k and 64k development corpora, and the English Broadcast News ’96 corpus. It is shown that the Viterbi approximation within words has no effect on standard and word posterior based recognition. Using word posterior probabilities with and without word context, the relative reduction in word error rate is comparable and ranges between 1.5% and 5%. A reason why the additional consideration of word context does not further improve the recognition performance might be that the increase in word context information is traded against a decrease in the number of word sequences that contributes to a particular word posterior probability.

[1]  Hermann Ney,et al.  Using posterior word probabilities for improved speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Hermann Ney,et al.  The RWTH large vocabulary continuous speech recognition system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Hermann Ney,et al.  A comparison of word graph and n-best list based confidence measures , 1999, EUROSPEECH.

[4]  Hermann Ney,et al.  A word graph algorithm for large vocabulary continuous speech recognition , 1994, Comput. Speech Lang..

[5]  Thomas Bub,et al.  VERBMOBIL: the evolution of a complex large speech-to-speech translation system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[6]  Gunnar Evermann,et al.  Large vocabulary decoding and confidence estimation using word posterior probabilities , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Ralf Schlüter,et al.  Using word probabilities as confidence measures , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).