Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

We experiment with extending a lattice parsing methodology for parsing Hebrew (Goldberg and Tsarfaty, 2008; Golderg et al., 2009) to make use of a stronger syntactic model: the PCFG-LA Berkeley Parser. We show that the methodology is very effective: using a small training set of about 5500 trees, we construct a parser which parses and segments unsegmented Hebrew text with an F-score of almost 80%, an error reduction of over 20% over the best previous result for this task. This result indicates that lattice parsing with the Berkeley parser is an effective methodology for parsing over uncertain inputs.

[1]  Yoav Goldberg,et al.  EM Can Find Pretty Good HMM POS-Taggers (When Given a Good Start) , 2008, ACL.

[2]  Martin Rajman,et al.  Lattice Parsing for Speech Recognition , 1999 .

[3]  Mary P. Harper,et al.  Self-Training PCFG Grammars with Latent Annotations Across Languages , 2009, EMNLP.

[4]  Reut Tsarfaty,et al.  A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing , 2008, ACL.

[5]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[6]  Christopher D. Manning,et al.  Better Arabic Parsing: Baselines, Evaluations, and Analysis , 2010, COLING.

[7]  Reut Tsarfaty,et al.  Integrated Morphological and Syntactic Disambiguation for Modern Hebrew , 2006, ACL.

[8]  Yoav Goldberg,et al.  Language-Independent Parsing with Empty Elements , 2011, ACL.

[9]  Jun'ichi Tsujii,et al.  Probabilistic CFG with Latent Annotations , 2005, ACL.

[10]  Reut Tsarfaty,et al.  Enhancing Unlexicalized Parsing Performance Using a Wide Coverage Lexicon, Fuzzy Tag-Set Mapping, and EM-HMM-Based Lexical Probabilities , 2009, EACL.

[11]  Yoav Goldberg,et al.  Unsupervised Lexicon-Based Resolution of Unknown Words for Full Morphological Analysis , 2008, ACL.

[12]  Detlef Prescher,et al.  Inducing Head-Driven PCFGs with Latent Heads: Refining a Tree-Bank Grammar for Parsing , 2005, ECML.

[13]  Khalil Sima'an,et al.  Building a tree-bank of modern hebrew text , 2001 .

[14]  Dan Klein,et al.  Parsing German with Latent Variable Grammars , 2008 .

[15]  Yuval Krymolowski,et al.  Automatic Annotation of Morpho-Syntactic Dependencies in a Modern Hebrew Treebank , 2008 .

[16]  Alon Itai,et al.  Language resources for Hebrew , 2008, Lang. Resour. Evaluation.

[17]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[18]  Khalil Sima'an,et al.  Modeling Morphosyntactic Agreement in Constituency-Based Parsing of Modern Hebrew , 2010, SPMRL@NAACL-HLT.

[19]  Djamé Seddah,et al.  On Statistical Parsing of French with Supervised and Semi-Supervised Strategies , 2009 .

[20]  Slav Petrov,et al.  Coarse-to-Fine Natural Language Processing , 2011, Theory and Applications of Natural Language Processing.