论文信息 - Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields

Sparse Forward-Backward Using Minimum Divergence Beams for Fast Training Of Conditional Random Fields

Hidden Markov models and linear-chain conditional random fields (CRFs) are applicable to many tasks in spoken language processing. In large state spaces, however, training can be expensive, because it often requires many iterations of forward-backward. Beam search is a standard heuristic for controlling complexity during Viterbi decoding, but during forward-backward, standard beam heuristics can be dangerous, as they can make training unstable. We introduce sparse forward-backward, a variational perspective on beam methods that uses an approximating mixture of Kronecker delta functions. This motivates a novel minimum-divergence beam criterion based on minimizing KL divergence between the respective marginal distributions. Our beam selection approach is not only more efficient for Viterbi decoding, but also more stable within sparse forward-backward training. For a standard text-to-speech problem, we reduce CRF training time fourfold - from over a day to six hours - with no loss in accuracy

[1] Frank Jensen,et al. Approximations in Bayesian Belief Universe for Knowledge Based Systems , 2013, UAI 1990.

[2] David J. Spiegelhalter,et al. Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[3] Alex Acero,et al. Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[4] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[5] Terrence J. Sejnowski,et al. NETtalk: a parallel network that learns to read aloud , 1988 .

[6] James M. Coughlan,et al. Finding Deformable Shapes Using Loopy Belief Propagation , 2002, ECCV.

[7] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..

[8] Mosur Ravishankar,et al. Efficient Algorithms for Speech Recognition. , 1996 .

[9] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10] D K Smith,et al. Numerical Optimization , 2001, J. Oper. Res. Soc..