Robust PCFG-Based Generation Using Automatically Acquired LFG Approximations

We present a novel PCFG-based architecture for robust probabilistic generation based on wide-coverage LFG approximations (Cahill et al., 2004) automatically extracted from treebanks, maximising the probability of a tree given an f-structure. We evaluate our approach using string-based evaluation. We currently achieve coverage of 95.26%, a BLEU score of 0.7227 and string accuracy of 0.7476 on the Penn-II WSJ Section 23 sentences of length ≤20.

[1]  Irene Langkilde Forest-Based Statistical Sentence Generation , 2000, ANLP.

[2]  Stephan Oepen,et al.  Maximum Entropy Models for Realization Ranking , 2005 .

[3]  Ronald M. Kaplan,et al.  Lexical Functional Grammar A Formal System for Grammatical Representation , 2004 .

[4]  Srinivas Bangalore,et al.  Impact of Quality and Quantity of Corpora on Stochastic Generation , 2001, EMNLP.

[5]  Stephan Oepen,et al.  High Efficiency Realization for a Wide-Coverage Unification Grammar , 2005, IJCNLP.

[6]  Andy Way,et al.  Treebank-Based Acquisition of Multilingual Unification Grammar Resources , 2005 .

[7]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[9]  Martin Kay,et al.  Chart Generation , 1996, ACL.

[10]  Irene Langkilde-Geary,et al.  An Empirical Verification of Coverage and Correctness for a General-Purpose Sentence Generator , 2002, INLG.

[11]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[12]  Jun'ichi Tsujii,et al.  Probabilistic Models for Disambiguation of an HPSG-Based Chart Generator , 2005, IWPT.

[13]  Irene Langkilde-Geary,et al.  Forest-Based Statistical Sentence Generation , 2000, ANLP.

[14]  Steven P. Abney Stochastic Attribute-Value Grammars , 1996, CL.

[15]  Ronald M. Kaplan,et al.  LFG Generation Produces Context-free Languages , 2000, COLING.

[16]  Srinivas Bangalore,et al.  Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[17]  Anja Belz,et al.  Statistical Generation: Three Methods Compared and Evaluated , 2005, ENLG.

[18]  Andy Way,et al.  Treebank-Based Acquisition of a Chinese Lexical-Functional Grammar , 2004, PACLIC.

[19]  Andy Way,et al.  Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations , 2004, ACL.

[20]  Richard Sproat,et al.  Estimating Lexical Priors for Low-Frequency Morphologically Ambiguous Forms , 1996, Comput. Linguistics.

[21]  Andy Way,et al.  Automatic acquisition of Spanish LFG resources from the Cast3LB treebank , 2005 .