论文信息 - Lexicalized Markov Grammars for Sentence Compression

Lexicalized Markov Grammars for Sentence Compression

We present a sentence compression system based on synchronous context-free grammars (SCFG), following the successful noisy-channel approach of (Knight and Marcu, 2000). We define a headdriven Markovization formulation of SCFG deletion rules, which allows us to lexicalize probabilities of constituent deletions. We also use a robust approach for tree-to-tree alignment between arbitrary document-abstract parallel corpora, which lets us train lexicalized models with much more data than previous approaches relying exclusively on scarcely available document-compression corpora. Finally, we evaluate different Markovized models, and find that our selected best model is one that exploits head-modifier bilexicalization to accurately distinguish adjuncts from complements, and that produces sentences that were judged more grammatical than those generated by previous work.

Kathleen McKeown | Michel Galley | K. McKeown | Michel Galley

[1] Richard Edwin Stearns,et al. Syntax-Directed Transduction , 1966, JACM.

[2] Alfred V. Aho,et al. Syntax Directed Translations and the Pushdown Assembler , 1969, J. Comput. Syst. Sci..

[3] Aravind K. Joshi,et al. Tree Adjunct Grammars , 1975, J. Comput. Syst. Sci..

[4] Kaizhong Zhang,et al. Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[5] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6] Mark Johnson,et al. PCFG Models of Linguistic Tree Representations , 1998, CL.

[7] Daniel Marcu,et al. Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[8] Hongyan Jing,et al. Sentence Reduction for Automatic Text Summarization , 2000, ANLP.

[9] Michael Collins,et al. Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[10] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[11] Richard M. Schwartz,et al. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation , 2003, HLT-NAACL 2003.

[12] Eugene Charniak,et al. Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking , 2005, ACL.

[13] Eugene Charniak,et al. Supervised and Unsupervised Learning for Sentence Compression , 2005, ACL.

[14] Ryan T. McDonald. Discriminative Sentence Compression with Soft Syntactic Evidence , 2006, EACL.