Any Domain Parsing: Automatic Domain Adaptation for Natural Language Parsing

Current efforts in syntactic parsing are largely data-driven. These methods require labeled examples of syntactic structures to learn statistical patterns governing these structures. Labeled data typically requires expert annotators which makes it both time consuming and costly to produce. Furthermore, once training data has been created for one textual domain, portability to similar domains is limited. This domain-dependence has inspired a large body of work since syntactic parsing aims to capture syntactic patterns across an entire language rather than just a specific domain. The simplest approach to this task is to assume that the target domain is essentially the same as the source domain. No additional knowledge about the target domain is included. A more realistic approach assumes that only raw text from the target domain is available. This assumption lends itself well to semi-supervised learning methods since these utilize both labeled and unlabeled examples. This dissertation focuses on a family of semi-supervised methods called self-training. Self-training creates semi-supervised learners from existing supervised learners with minimal effort. We first show results on self-training for constituency parsing within a single domain. While self-training has failed here in the past, we present a simple modification which allows it to succeed, producing state-of-the-art results for English constituency parsing. Next, we show how self-training is beneficial when parsing across domains and helps further when raw text is available from the target domain. One of the remaining issues is that one must choose a training corpus appropriate for the target domain or performance may be severely impaired. Humans can do this in some situations, but this strategy becomes less practical as we approach larger data sets. We present a technique, Any Domain Parsing, which automatically detects useful source domains and mixes them together to produce a customized parsing model. The resulting models perform almost as well as the best seen parsing models (oracle) for each target domain. As a result, we have a fully automatic syntactic constituency parser which can produce high-quality parses for all types of text, regardless of domain.

[1]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[2]  Andrés Marzal,et al.  Computation of the N Best Parse Trees for Weighted and Stochastic Context-Free Grammars , 2000, SSPR/SPR.

[3]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[4]  Anoop Sarkar,et al.  Applying Co-Training Methods to Statistical Parsing , 2001, NAACL.

[5]  Ralph Grishman,et al.  A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars , 1991, HLT.

[6]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[7]  Josef van Genabith,et al.  Adapting WSJ-Trained Parsers to the British National Corpus using In-Domain Self-Training , 2007, IWPT.

[8]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[9]  Micha Elsner,et al.  EM Works for Pronoun Anaphora Resolution , 2009, EACL.

[10]  Brian Roark,et al.  Supervised and unsupervised PCFG adaptation to novel domains , 2003, NAACL.

[11]  David A. Smith,et al.  Parser Adaptation and Projection with Quasi-Synchronous Grammar Features , 2009, EMNLP.

[12]  Ann Bies,et al.  Bracketing Guidelines For Treebank II Style Penn Treebank Project , 1995 .

[13]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[14]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[15]  Eugene Charniak,et al.  Reranking and Self-Training for Parser Adaptation , 2006, ACL.

[16]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[17]  Satoshi Sekine,et al.  The Domain Dependence of Parsing , 1997, ANLP.

[18]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[19]  Martin Kay,et al.  Syntactic Process , 1979, ACL.

[20]  Xiaojin Zhu,et al.  Semi-Supervised Learning Literature Survey , 2005 .

[21]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[22]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[23]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[24]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[25]  Andrew B. Clegg,et al.  Evaluating and Integrating Treebank Parsers on a Biomedical Corpus , 2005, ACL 2005.

[26]  Charles Jochim,et al.  A Simple Method for Tagset Comparision , 2008, LREC.

[27]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.

[28]  Jun'ichi Tsujii,et al.  Syntax Annotation for the GENIA Corpus , 2005, IJCNLP.

[29]  Dan Klein,et al.  Discriminative Log-Linear Grammars with Latent Variables , 2007, NIPS.

[30]  Hans van Halteren,et al.  Linguistic Profiling for Authorship Recognition and Verification , 2004, ACL.

[31]  Mark Steedman,et al.  Bootstrapping statistical parsers from small datasets , 2003, EACL.

[32]  Micha Elsner,et al.  Coreference-inspired Coherence Modeling , 2008, ACL.

[33]  Walter Daelemans,et al.  Authorship Attribution and Verification with Many Authors and Limited Data , 2008, COLING.

[34]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[35]  Ben Medlock,et al.  Exploring hedge identification in biomedical literature , 2008, J. Biomed. Informatics.

[36]  Mark Johnson,et al.  Estimators for Stochastic “Unification-Based” Grammars , 1999, ACL.

[37]  Stephen R. Clark,et al.  CLSP WS-02 Final Report: Semi-Supervised Training for Statistical Parsing , 2003 .

[38]  Zhongmin Shi,et al.  Simultaneous Identification of Biomedical Named-Entity and Functional Relation Using Statistical Parsing Techniques , 2007, HLT-NAACL.

[39]  Kenji Yamada,et al.  Syntax-based language models for statistical machine translation , 2003, ACL 2003.

[40]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[41]  Matthew Lease,et al.  A Look at Parsing and Its Applications , 2006, AAAI.

[42]  Koby Crammer,et al.  Online Methods for Multi-Domain Learning and Adaptation , 2008, EMNLP.

[43]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[44]  Eugene Charniak,et al.  Immediate-Head Parsing for Language Models , 2001, ACL.

[45]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[46]  Eugene Charniak,et al.  Automatic Domain Adaptation for Parsing , 2010, NAACL.

[47]  Matt Post,et al.  Syntax-based language models for statistical machine translation , 2010 .

[48]  S.J.J. Smith,et al.  Empirical Methods for Artificial Intelligence , 1995 .

[49]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[50]  Daisuke Kawahara,et al.  Learning Reliability of Parses for Domain Adaptation of Dependency Parsing , 2008, IJCNLP.

[51]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[52]  John D. Lafferty,et al.  Decision Tree Parsing using a Hidden Derivation Model , 1994, HLT.

[53]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[54]  Xavier Carreras,et al.  TAG, Dynamic Programming, and the Perceptron for Efficient, Feature-Rich Parsing , 2008, CoNLL.

[55]  Rada Mihalcea,et al.  Co-training and Self-training for Word Sense Disambiguation , 2004, CoNLL.

[56]  Tejaswini Deoskar,et al.  Re-estimation of Lexical Parameters for Treebank PCFGs , 2008, COLING.

[57]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[58]  Daniel Gildea,et al.  Corpus Variation and Parser Performance , 2001, EMNLP.

[59]  Matthew Lease,et al.  Parsing Biomedical Literature , 2005, IJCNLP.

[60]  Dale Schuurmans,et al.  Semi-Supervised Convex Training for Dependency Parsing , 2008, ACL.

[61]  Eugene Charniak,et al.  Supervised and Unsupervised Learning for Sentence Compression , 2005, ACL.

[62]  Zhongmin Shi,et al.  Exploiting Rich Syntactic Information for Relationship Extraction from Biomedical Articles , 2007, HLT-NAACL.

[63]  Kevin Duh,et al.  Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing , 2009, HLT-NAACL 2009.

[64]  John D. Lafferty,et al.  Towards History-based Grammars: Using Richer Models for Probabilistic Parsing , 1993, ACL.

[65]  Mirella Lapata,et al.  Proceedings of ACL-08: HLT , 2008 .

[66]  Micha Elsner,et al.  A Unified Local and Global Model for Discourse Coherence , 2007, NAACL.

[67]  Mihai Surdeanu,et al.  Event Extraction as Dependency Parsing for BioNLP 2011 , 2011, BioNLP@ACL.

[68]  Josef van Genabith,et al.  Parser Evaluation and the BNC: Evaluating 4 constituency parsers with 3 metrics , 2008, LREC.

[69]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[70]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[71]  Matthew Lease,et al.  An Improved Model for Recognizing Disfluencies in Conversational Speech , 2004 .

[72]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[73]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[74]  Adwait Ratnaparkhi,et al.  Learning to Parse Natural Language with Maximum Entropy Models , 1999, Machine Learning.

[75]  Kevin Knight,et al.  Automatic Prediction of Parser Accuracy , 2008, EMNLP.

[76]  Markus Dickinson,et al.  Correcting Dependency Annotation Errors , 2009, EACL.

[77]  Claire Cardie,et al.  Weakly Supervised Natural Language Learning Without Redundant Views , 2003, NAACL.

[78]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[79]  Rens Bod An efficient implementation of a new DOP model , 2003, EACL.

[80]  Jennifer Foster,et al.  Similarity Rules! Exploring Methods for Ad-Hoc Rule Detection , 2008 .

[81]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[82]  Joakim Nivre,et al.  Pseudo-Projective Dependency Parsing , 2005, ACL.

[83]  Zheng-Yu Niu,et al.  Exploiting Heterogeneous Treebanks for Parsing , 2009, ACL/IJCNLP.

[84]  Khalil Sima'an,et al.  Proceedings of the Sixth International Language Resources and Evaluation (LREC'08) , 2008 .

[85]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[86]  Joshua Goodman,et al.  A bit of progress in language modeling , 2001, Comput. Speech Lang..

[87]  James R. Curran,et al.  Bootstrapping POS-taggers using unlabelled data , 2003, CoNLL.

[88]  Ari Rappoport,et al.  Self-Training for Enhancement and Domain Adaptation of Statistical Parsers Trained on Small Datasets , 2007, ACL.

[89]  Phil Blunsom,et al.  Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics , 2009 .

[90]  Alexander M. Fraser,et al.  Semi-Supervised Training for Statistical Word Alignment , 2006, ACL.

[91]  Valentin I. Spitkovsky,et al.  Stanford's Distantly-Supervised Slot-Filling System , 2011, TAC.

[92]  Peng Xu,et al.  A Study on Richer Syntactic Dependencies for Structured Language Modeling , 2002, ACL.

[93]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[94]  Barbara Plank,et al.  Subdomain Sensitive Statistical Parsing using Raw Corpora , 2008, LREC.

[95]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[96]  Brian Roark,et al.  Probabilistic Top-Down Parsing and Language Modeling , 2001, CL.

[97]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[98]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[99]  Xiaoqiang Luo,et al.  Multi-Lingual Coreference Resolution With Syntactic Features , 2005, HLT/EMNLP.

[100]  Philip Resnik,et al.  More than Words: Syntactic Packaging and Implicit Sentiment , 2009, NAACL.

[101]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[102]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[103]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[104]  Walt Detmar Meurers,et al.  On Detecting Errors in Dependency Treebanks , 2008 .

[105]  Sharon Goldwater,et al.  Improving Statistical MT through Morphological Analysis , 2005, HLT.

[106]  Frederick Jelinek,et al.  Exploiting Syntactic Structure for Language Modeling , 1998, ACL.

[107]  Yves Schabes,et al.  Stochastic Lexicalized Tree-adjoining Grammars , 1992, COLING.

[108]  Christopher D. Manning,et al.  Efficient, Feature-based, Conditional Random Field Parsing , 2008, ACL.

[109]  Halil Kilicoglu,et al.  Recognizing speculative language in biomedical research articles: a linguistically motivated perspective , 2008, BMC Bioinformatics.

[110]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[111]  Wayne H. Ward,et al.  Towards Robust Semantic Role Labeling , 2007, CL.

[112]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[113]  James Henderson,et al.  Discriminative Training of a Neural Network Statistical Parser , 2004, ACL.

[114]  Mary P. Harper,et al.  SParseval: Evaluation Metrics for Parsing Speech , 2006, LREC.