论文信息 - Learning Beam Search Policies via Imitation Learning

Learning Beam Search Policies via Imitation Learning

Beam search is widely used for approximate decoding in structured prediction problems. Models often use a beam at test time but ignore its existence at train time, and therefore do not explicitly learn how to use the beam. We develop an unifying meta-algorithm for learning beam search policies using imitation learning. In our setting, the beam is part of the model and not just an artifact of approximate decoding. Our meta-algorithm captures existing learning algorithms and suggests new ones. It also lets us show novel no-regret guarantees for learning beam search policies.

Geoffrey J. Gordon | Matthew R. Gormley | Renato Negrinho | Renato Negrinho

[1] Slav Petrov,et al. Globally Normalized Transition-Based Neural Networks , 2016, ACL.

[2] Slav Petrov,et al. Structured Training for Neural Network Transition-Based Parsing , 2015, ACL.

[3] Claudio Gentile,et al. On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[4] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Noah A. Smith,et al. Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[6] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[7] John Langford,et al. Machine Learning Techniques—Reductions Between Prediction Quality Metrics , 2008 .

[8] Alexander M. Rush,et al. Sequence-to-Sequence Learning as Beam-Search Optimization , 2016, EMNLP.

[9] Daniel Marcu,et al. Learning as search optimization: approximate large margin methods for structured prediction , 2005, ICML.

[10] Alan Fern,et al. On learning linear ranking functions for beam search , 2007, ICML '07.

[11] Yang Guo,et al. Structured Perceptron with Inexact Search , 2012, HLT-NAACL.

[12] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[13] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.

[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.