Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

Sequential modeling has been widely used in a variety of important applications including named entity recognition and shallow parsing. However, as more and more real time large-scale tagging applications arise, decoding speed has become a bottleneck for existing sequential tagging algorithms. In this paper we propose 1-best A*, 1-best iterative A*, k-best A* and k-best iterative Viterbi A* algorithms for sequential decoding. We show the efficiency of these proposed algorithms for five NLP tagging tasks. In particular, we show that iterative Viterbi A* decoding can be several times or orders of magnitude faster than the state-of-the-art algorithm for tagging tasks with a large number of labels. This algorithm makes real-time large-scale tagging applications with thousands of labels feasible.

[1]  Christopher Raphael,et al.  Coarse-to-Fine Dynamic Programming , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[3]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[4]  Trevor Cohn Efficient Inference in Large Conditional Random Fields , 2006, ECML.

[5]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[6]  Yasuhiro Fujiwara,et al.  Efficient Staggered Decoding for Sequence Labeling , 2010, ACL.

[7]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[8]  Andrew W. Moore,et al.  Fast inference and learning in large-state-space HMMs , 2005, ICML '05.

[9]  Jun'ichi Tsujii,et al.  Efficient HPSG Parsing with Supertagging and CFG-Filtering , 2007, IJCAI.

[10]  Dan Klein,et al.  Optimal Graph Search with Iterated Graph Cuts , 2011, AAAI.

[11]  Fernando Pereira,et al.  Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[12]  Noah A. Smith,et al.  Proceedings of EMNLP , 2007 .

[13]  David A. McAllester,et al.  The Generalized A* Architecture , 2007, J. Artif. Intell. Res..

[14]  Emma L. Tonkin Proceedings of ECDL , 2007 .

[15]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Gary Geunbae Lee,et al.  Efficient Inference of CRFs for Large-Scale Natural Language Data , 2009, ACL.

[18]  Dan Klein,et al.  K-Best A* Parsing , 2009, ACL.

[19]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[20]  Dan Klein,et al.  A* Parsing: Fast Exact Viterbi Parse Selection , 2003, NAACL.

[21]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[22]  Daniele P. Radicioni,et al.  CarpeDiem: Optimizing the Viterbi Algorithm and Applications to Supervised Sequential Learning , 2009, J. Mach. Learn. Res..

[23]  David Ellis,et al.  Multilevel Coarse-to-Fine PCFG Parsing , 2006, NAACL.

[24]  Shay B. Cohen,et al.  Proceedings of ACL , 2013 .