Combining pattern-based CRFs and weighted context-free grammars

We consider two models for the sequence labeling (tagging) problem. The first one is a {\em Pattern-Based Conditional Random Field }(\PB), in which the energy of a string (chain labeling) $x=x_1\ldots x_n\in D^n$ is a sum of terms over intervals $[i,j]$ where each term is non-zero only if the substring $x_i\ldots x_j$ equals a prespecified word $w\in \Lambda$. The second model is a {\em Weighted Context-Free Grammar }(\WCFG) frequently used for natural language processing. \PB and \WCFG encode local and non-local interactions respectively, and thus can be viewed as complementary. We propose a {\em Grammatical Pattern-Based CRF model }(\GPB) that combines the two in a natural way. We argue that it has certain advantages over existing approaches such as the {\em Hybrid model} of Bened{\'i} and Sanchez that combines {\em $\mbox{$N$-grams}$} and \WCFGs. The focus of this paper is to analyze the complexity of inference tasks in a \GPB such as computing MAP. We present a polynomial-time algorithm for general \GPBs and a faster version for a special case that we call {\em Interaction Grammars}.

[1]  José-Miguel Benedí,et al.  RNA Modeling by Combining Stochastic Context-Free Grammars and n-Gram Models , 2002, Int. J. Pattern Recognit. Artif. Intell..

[2]  Giuseppe F. Italiano,et al.  Experimental analysis of dynamic all pairs shortest path algorithms , 2004, SODA '04.

[3]  Dan Wu,et al.  Conditional Random Fields with High-Order Features for Sequence Labeling , 2009, NIPS.

[4]  Joan-Andreu Sánchez,et al.  Combination Of N-Grams And Stochastic Context-Free Grammars For Language Modeling , 2000, COLING.

[5]  DemetrescuCamil,et al.  A new approach to dynamic all pairs shortest paths , 2004 .

[6]  Xuedong Huang,et al.  A unified context-free grammar and n-gram model for spoken language processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Xuanjing Huang,et al.  Sparse higher order conditional random fields for improved sequence labeling , 2009, ICML '09.

[8]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[9]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.

[10]  Giuseppe F. Italiano,et al.  A new approach to dynamic all pairs shortest paths , 2004, JACM.

[11]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[12]  Toby Walsh,et al.  The Weighted CfgConstraint , 2008, CPAIOR.

[13]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[14]  Andrew V. Goldberg,et al.  Scaling algorithms for the shortest paths problem , 1995, SODA '93.

[15]  Vladimir Kolmogorov,et al.  Inference Algorithms for Pattern-Based CRFs on Sequence Data , 2015, Algorithmica.

[16]  David R. Karger,et al.  Finding the Hidden Path: Time Bounds for All-Pairs Shortest Paths , 1993, SIAM J. Comput..

[17]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[18]  Zhenisbek Assylbekov,et al.  Patterns Versus Characters in Subword-Aware Neural Language Modeling , 2017, ICONIP.

[19]  Uri Zwick,et al.  All pairs shortest paths using bridging sets and rectangular matrix multiplication , 2000, JACM.

[20]  Toby Walsh,et al.  The weighted Grammar constraint , 2011, Ann. Oper. Res..

[21]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[22]  Yehoshua Bar-Hillel,et al.  Language and information : selected essays on their theory and application , 1965 .

[23]  Wee Sun Lee,et al.  Semi-Markov Conditional Random Field with High-Order Features , 2011 .