论文信息 - Discriminative sentence compression with conditional random fields

Discriminative sentence compression with conditional random fields

The paper focuses on a particular approach to automatic sentence compression which makes use of a discriminative sequence classifier known as Conditional Random Fields (CRF). We devise several features for CRF that allow it to incorporate information on nonlinear relations among words. Along with that, we address the issue of data paucity by collecting data from RSS feeds available on the Internet, and turning them into training data for use with CRF, drawing on techniques from biology and information retrieval. We also discuss a recursive application of CRF on the syntactic structure of a sentence as a way of improving the readability of the compression it generates. Experiments found that our approach works reasonably well compared to the state-of-the-art system [Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 91-107.].

Tadashi Nomoto | Tadashi Nomoto

[1] Ryan T. McDonald. Discriminative Sentence Compression with Soft Syntactic Evidence , 2006, EACL.

[2] Akira Shimazu,et al. Probabilistic Sentence Reduction Using Support Vector Machines , 2004, COLING.

[3] Fernando Pereira,et al. Shallow Parsing with Conditional Random Fields , 2003, NAACL.

[4] Eugene Charniak,et al. Supervised and Unsupervised Learning for Sentence Compression , 2005, ACL.

[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6] S. B. Needleman,et al. A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[7] Stefan Riezler,et al. Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar , 2003, NAACL.

[8] M S Waterman,et al. Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[9] Yi Pan,et al. Sentence Compression for Automated Subtitling: A Hybrid Approach , 2004, ACL 2004.

[10] S. B. Needleman,et al. A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[11] Andrew McCallum,et al. A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.