Factors Affecting the Accuracy of Korean Parsing

We investigate parsing accuracy on the Korean Treebank 2.0 with a number of different grammars. Comparisons among these grammars and to their English counterparts suggest different aspects of Korean that contribute to parsing difficulty. Our results indicate that the coarseness of the Treebank's nonterminal set is a even greater problem than in the English Treebank. We also find that Korean's relatively free word order does not impact parsing results as much as one might expect, but in fact the prevalence of zero pronouns accounts for a large portion of the difference between Korean and English parsing scores.

[1]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[2]  Michael Collins,et al.  Three Generative, Lexicalised Models for Statistical Parsing , 1997, ACL.

[3]  Rens Bod What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy? , 2001, ACL.

[4]  S. E. Martin A reference grammar of Korean : a complete guide to the grammar and history of the Korean language , 1992 .

[5]  Na-Rae Han,et al.  Bracketing Guidelines for Penn Korean TreeBank , 2001 .

[6]  David M. Magerman Statistical Decision-Tree Models for Parsing , 1995, ACL.

[7]  Matt Post,et al.  Bayesian Learning of a Tree Substitution Grammar , 2009, ACL.

[8]  Hae-Chang Rim,et al.  Probabilistic Models for Korean Morphological Analysis , 2005, IJCNLP.

[9]  Mark Johnson,et al.  PCFG Models of Linguistic Tree Representations , 1998, CL.

[10]  Na-Rae Han,et al.  Guidelines for Penn Korean Treebank Version 2.0 , 2005 .

[11]  Anoop Sarkar,et al.  Statistical Morphological Tagging and Parsing of Korean with an LTAG Grammar , 2002, TAG+.

[12]  Owen Rambow,et al.  WORD ORDER VARIATION AND TREE‐ADJOINING GRAMMAR , 1994, Comput. Intell..

[13]  Phil Blunsom,et al.  Inducing Compact but Accurate Tree-Substitution Grammars , 2009, NAACL.

[14]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[15]  Martha Palmer,et al.  A Morphological Tagger for Korean: Statistical Tagging Combined with Corpus-Based Morphological Rule Application , 2004, Machine Translation.

[16]  David Chiang,et al.  Statistical Parsing with an Automatically-Extracted Tree Adjoining Grammar , 2000, ACL.

[17]  Detlef Prescher,et al.  Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing , 2005, ECML.

[18]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[19]  Martha Palmer,et al.  Korean zero pronouns: analysis and resolution , 2006 .

[20]  Laura Kallmeyer,et al.  Tree-local MCTAG with Shared Nodes: Word Order Variation in German and Korean , 2004, TAG+.

[21]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[22]  Eugene Charniak,et al.  Tree-Bank Grammars , 1996, AAAI/IAAI, Vol. 2.

[23]  Federico Sangati,et al.  Unsupervised Methods for Head Assignments , 2009, EACL.

[24]  Ulf Hermjakob Rapid Parser Development: A Machine Learning Approach for Korean , 2000, ANLP.

[25]  Byoung-Tak Zhang,et al.  Korean Compound Noun Decomposition Using Syllabic Information Only , 2004, CICLing.

[26]  Josef van Genabith,et al.  Evaluating Evaluation Measures , 2007, NODALIDA.