GRAPHICAL MODEL REPRESENTATIONS OF WORD LATTICES

We introduce a method for expressing word lattices within a dynamic graphical model. We describe a variety of choices for doing this, including a technique to relax the time information associated with lattice nodes in a way that trades off hypothesis expansion with presumed segmentation boundary accuracy. Our approach uses a set of time-inhomogeneous and algorithmically expressed conditional probability tables to encode the lattice. The approach was implemented as part of the graphical model toolkit, and word error rate improvements on the Switchboard corpus indicate that our technique is a viable means to incorporate large state space speech recognition systems into a graphical model.

[1]  Mark J. F. Gales,et al.  Development of the CU-HTK 2004 broadcast news transcription systems , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[2]  Steve Young,et al.  The HTK book , 1995 .

[3]  Jeff A. Bilmes,et al.  On Triangulating Dynamic Graphical Models , 2002, UAI.

[4]  J.A. Bilmes,et al.  Graphical model architectures for speech recognition , 2005, IEEE Signal Processing Magazine.

[5]  Geoffrey Zweig,et al.  The IBM 2004 conversational telephony system for rich transcription , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Richard M. Schwartz,et al.  The 2004 BBN 1xRT recognition systems for English broadcast news and conversational telephone speech , 2005, INTERSPEECH.

[7]  James R. Glass,et al.  Hidden feature models for speech recognition using dynamic Bayesian networks , 2003, INTERSPEECH.

[8]  Steve Young,et al.  A review of large-vocabulary continuous-speech recognition , 1996 .

[9]  Michael I. Jordan Graphical Models , 2003 .

[10]  Jeff A. Bilmes,et al.  Multi-Speaker Language Modeling , 2004, HLT-NAACL.

[11]  Steve Young,et al.  A review of large-vocabulary continuous-speech , 1996, IEEE Signal Process. Mag..