Discriminative Sentence Compression with Soft Syntactic Evidence

We present a model for sentence compression that uses a discriminative largemargin learning framework coupled with a novel feature set defined on compressed bigrams as well as deep syntactic representations provided by auxiliary dependency and phrase-structure parsers. The parsers are trained out-of-domain and contain a significant amount of noise. We argue that the discriminative nature of the learning algorithm allows the model to learn weights relative to any noise in the feature set to optimize compression accuracy directly. This differs from current state-of-the-art models (Knight and Marcu, 2000) that treat noisy parse trees, for both compressed and uncompressed sentences, as gold standard when calculating model parameters.