Text Compression by Syntactic Pruning

We present a method for text compression, which relies on pruning of a syntactic tree. The syntactic pruning applies to a complete analysis of sentences, performed by a French dependency grammar. Sub-trees in the syntactic analysis are pruned when they are labelled with targeted relations. Evaluation is performed on a corpus of sentences which have been manually compressed. The reduction ratio of extracted sentences averages around 70%, while retaining grammaticality or readability in a proportion of over 74%. Given these results on a limited set of syntactic relations, this shows promise for any application which requires compression of texts, including text summarization.

[1]  P. Blache,et al.  Une grille d'évaluation pour les analyseurs syntaxiques , 2003 .

[2]  Anne Abeillé,et al.  Treebanks: Building and Using Parsed Corpora , 2003 .

[3]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[4]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[5]  Gregory Grefenstette Light parsing as finite state filtering , 1999 .

[6]  Gregory Grefenstette Producing Intelligent Telegraphic Text Reduction to provide an Audio Scanning Service for the Blind , 1998 .

[7]  Jean-Luc Minel,et al.  Résumé automatique de textes , 2004 .

[8]  Christopher Culy,et al.  Hybrid Text Summarization: Combining External Relevance Measures with Structural Analysis , 2004 .

[9]  Halil Kilicoglu,et al.  Abstraction Summarization for Managing the Biomedical Research Literature , 2004, HLT-NAACL 2004.

[10]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[11]  Patrick Drouin,et al.  Intégration d'un analyseur syntaxique à large couverture dans un outil de langage contrôlé en français , 2005 .

[12]  Chin-Yew Lin Improving Summarization Performance by Sentence Compression — A Pilot Study , 2003 .

[13]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[14]  Michel Gagnon,et al.  Text Summarization by Sentence Extraction and Syntactic Pruning , 2005 .

[15]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[16]  Kathleen McKeown,et al.  The decomposition of human-written summary sentences , 1999, SIGIR '99.