Efficient Higher-Order CRFs for Morphological Tagging

Training higher-order conditional random fields is prohibitive for huge tag sets. We present an approximated conditional random field using coarse-to-fine decoding and early updating. We show that our implementation yields fast and accurate morphological taggers across six languages with different morphological properties and that across languages higher-order models give significant improvements over 1-order models.

[1]  Josef van Genabith,et al.  Learning Morphology with Morfette , 2008, LREC.

[2]  Alexander M. Fraser,et al.  Knowledge Sources for Constituent Parsing of German, a Morphologically Rich and Less-Configurational Language , 2013, CL.

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Sophia Ananiadou,et al.  Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty , 2009, ACL.

[5]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[6]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[7]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[8]  Yasuhiro Fujiwara,et al.  Efficient Staggered Decoding for Sequence Labeling , 2010, ACL.

[9]  Jan Hajic,et al.  Morphological Tagging: Data vs. Dictionaries , 2000, ANLP.

[10]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[11]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[12]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[13]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[14]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[15]  Giorgio Satta,et al.  Guided Learning for Bidirectional Sequence Classification , 2007, ACL.

[16]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[17]  Helmut Schmid,et al.  Estimation of Conditional Probabilities With Decision Trees and an Application to Fine-Grained POS Tagging , 2008, COLING.

[18]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[19]  Veronika Vincze,et al.  Dependency Parsing of Hungarian: Baseline Results and Challenges , 2012, EACL.

[20]  Christopher D. Manning Part-of-Speech Tagging from 97% to 100%: Is It Time for Some Linguistics? , 2011, CICLing.

[21]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[22]  Noah A. Smith,et al.  Context-Based Morphological Disambiguation with Random Fields , 2005, HLT.

[23]  Alexander M. Rush,et al.  Vine Pruning for Efficient Multi-Pass Dependency Parsing , 2012, NAACL.

[24]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[25]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[26]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[27]  Lluís Padró,et al.  FreeLing 3.0: Towards Wider Multilinguality , 2012, LREC.

[28]  János Csirik,et al.  The Szeged Treebank , 2005, TSD.

[29]  Alexander Gelbukh,et al.  Computational Linguistics and Intelligent Text Processing , 2015, Lecture Notes in Computer Science.

[30]  Alexander S. Yeh,et al.  More accurate tests for the statistical significance of result differences , 2000, COLING.