Syntactic Parse Fusion

Model combination techniques have consistently shown state-of-the-art performance across multiple tasks, including syntactic parsing. However, they dramatically increase runtime and can be difficult to employ in practice. We demonstrate that applying constituency model combination techniques to n-best lists instead of n different parsers results in significant parsing accuracy improvements. Parses are weighted by their probabilities and combined using an adapted version of Sagae and Lavie (2006). These accuracy gains come with marginal computational costs and are obtained on top of existing parsing techniques such as discriminative reranking and self-training, resulting in state-of-the-art accuracy: 92.6% on WSJ section 23. On out-of-domain corpora, accuracy is improved by 0.4% on average. We empirically confirm that six well-known n-best parsers benefit from the proposed methods across six domains.

[1]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[2]  Kevin Knight,et al.  Combining Constituent Parsers , 2009, NAACL.

[3]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[4]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[5]  Robert Dale,et al.  Charting Democracy Across Parsers , 2007, ALTA.

[6]  Eugene Charniak,et al.  Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[7]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[9]  Mark Johnson,et al.  Reranking the Berkeley and Brown Parsers , 2010, HLT-NAACL.

[10]  Reut Tsarfaty,et al.  Introducing the SPMRL 2014 Shared Task on Parsing Morphologically-rich Languages , 2014 .

[11]  Brian Roark,et al.  MAP adaptation of stochastic grammars , 2006, Comput. Speech Lang..

[12]  Haizhou Li,et al.  K-Best Combination of Syntactic Parsers , 2009, EMNLP.

[13]  Joakim Nivre,et al.  Analyzing and Integrating Dependency Parsers , 2011, CL.

[14]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[15]  Liang Huang,et al.  Forest Reranking: Discriminative Parsing with Non-Local Features , 2008, ACL.

[16]  Mary P. Harper,et al.  Self-Training with Products of Latent Variable Grammars , 2010, EMNLP.

[17]  Slav Petrov,et al.  Products of Random Latent Variable Grammars , 2010, NAACL.

[18]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[19]  Christopher D. Manning,et al.  Better Arabic Parsing: Baselines, Evaluations, and Analysis , 2010, COLING.

[20]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[21]  婁 子匡,et al.  甲寅 = The tiger , 1914 .

[22]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[23]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[24]  Hiroyuki Shindo,et al.  Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing , 2012, ACL.

[25]  Hopkins UniversityBaltimore Exploiting Diversity in Natural Language Processing: Combining Parsers , 1999 .

[26]  Josef van Genabith,et al.  Parser Evaluation and the BNC: Evaluating 4 constituency parsers with 3 metrics , 2008, LREC.

[27]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[28]  Shashi Narayan,et al.  Diversity in Spectral Learning for Natural Language Parsing , 2015, EMNLP.

[29]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[30]  Josef van Genabith,et al.  QuestionBank: Creating a Corpus of Parse-Annotated Questions , 2006, ACL.

[31]  Daniel Zeman,et al.  Improving Parsing Accuracy by Combining Diverse Dependency Parsers , 2005, IWPT.

[32]  Andrew McCallum,et al.  Combining joint models for biomedical event extraction , 2012, BMC Bioinformatics.