Stream weight tuning in dynamic Bayesian networks

In this paper we present a family of algorithms for estimating stream weights for dynamic Bayesian networks with multiple observation streams. For the 2 stream case, we present a weight tuning algorithm optimal in the minimum classification error sense. We compare the algorithms to brute-force search where feasible, as well as to previously published algorithms and show that the algorithms perform as well as brute-force search and outperform previously published algorithms. We test the stream weight tuning algorithm in the context of speech recognition with distinctive feature tandem models. We analyze how the criterion used for weight tuning differs from the standard word error rate criterion used in speech recognition.

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Simon King,et al.  SVitchboard 1: small vocabulary tasks from Switchboard , 2005, INTERSPEECH.

[3]  Simon King,et al.  Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Sadaoki Furui,et al.  A weight estimation method using LDA for multi-band speech recognition , 2006, INTERSPEECH.

[5]  Jeff A. Bilmes,et al.  Directed graphical models of classifier combination: application to phone recognition , 2000, INTERSPEECH.

[6]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[7]  Simon King,et al.  An Articulatory Feature-Based Tandem Approach and Factored Observation Modeling , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Biing-Hwang Juang,et al.  Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method , 1998, Proc. IEEE.

[9]  Geoffrey Zweig,et al.  The graphical models toolkit: An open source software system for speech and time-series processing , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Scott Axelrod,et al.  Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.