Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints

State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentence-level models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages.

[1]  David A. Smith,et al.  Dependency Parsing by Belief Propagation , 2008, EMNLP.

[2]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[3]  Mikhail Belkin,et al.  Maximum Margin Semi-Supervised Learning for Structured Variables , 2005, NIPS 2005.

[4]  Alexander M. Rush,et al.  Dual Decomposition for Parsing with Non-Projective Head Automata , 2010, EMNLP.

[5]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[6]  Rahul Gupta,et al.  Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials , 2010, J. Mach. Learn. Res..

[7]  Rebecca Hwa,et al.  Sample Selection for Statistical Parsing , 2004, CL.

[8]  Slav Petrov,et al.  Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models , 2010, EMNLP.

[9]  Ari Rappoport,et al.  Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets , 2007, ACL.

[10]  James R. Curran,et al.  Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[11]  Ben Taskar,et al.  Sparsity in Dependency Grammar Induction , 2010, ACL.

[12]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[13]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[14]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[15]  Jun'ichi Tsujii,et al.  Dependency Parsing and Domain Adaptation with LR Models and Parser Ensembles , 2007, EMNLP.

[16]  Gideon S. Mann,et al.  Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data , 2010, J. Mach. Learn. Res..

[17]  Stephen Clark,et al.  Adapting a Lexicalized-Grammar Parser to Contrasting Domains , 2008, EMNLP.

[18]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[19]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[20]  Alexander M. Rush,et al.  On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing , 2010, EMNLP.

[21]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[22]  Razvan C. Bunescu,et al.  Collective Information Extraction with Relational Markov Networks , 2004, ACL.

[23]  L. Rosasco,et al.  Manifold Regularization , 2007 .

[24]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[25]  Eugene Charniak,et al.  Self-Training for Biomedical Parsing , 2008, ACL.

[26]  Gholamreza Haffari,et al.  A Rate Distortion Approach for Semi-Supervised Conditional Random Fields , 2009, NIPS.

[27]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[28]  Andrew McCallum,et al.  Collective Segmentation and Labeling of Distant Entities in Information Extraction , 2004 .

[29]  Matthew Lease,et al.  Parsing Biomedical Literature , 2005, IJCNLP.

[30]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[31]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[32]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[33]  Ben Taskar,et al.  Structured Prediction Cascades , 2010, AISTATS.

[34]  Gideon S. Mann,et al.  Simple, robust, scalable semi-supervised learning via expectation regularization , 2007, ICML '07.

[35]  Ulf Brefeld,et al.  Semi-supervised learning for structured output variables , 2006, ICML.

[36]  Frank Keller,et al.  A probabilistic corpus-based model of parallelism , 2009 .

[37]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[38]  Jackie Chi Kit Cheung,et al.  Utilizing Extra-Sentential Context for Parsing , 2010, EMNLP.

[39]  Nikos Komodakis,et al.  MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[40]  Dan Klein,et al.  Learning from measurements in exponential families , 2009, ICML '09.

[41]  Eugene Charniak,et al.  Automatic Domain Adaptation for Parsing , 2010, NAACL.

[42]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[43]  Sandra Kübler,et al.  Domain Adaptation for Parsing , 2013, RANLP.