Knowledge-poor Multilingual Sentence Compression

We present a feature-based method for sentence compression. Firstly, a summary is created by our summarization method based on latent semantic analysis. The compression approach then removes unimportant clauses from the summary sentences. For each sentence a set of its possible compressed forms (compression candidates) is created. The candidates are then classified using 8 proposed features into two classes: in the first class there are candidates in which the important information was removed by compression and in the second class the information was still contained. The shortest candidate from the latter group substitutes the full sentence in the summary. The features are knowledge-poor which enables them to work with whatever language and the method can be easily extended by other features.

[1]  Josef Steinberger,et al.  Sentence Compression for the LSA-based Summarizer , 2006 .

[2]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[3]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[4]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[5]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[6]  Josef Steinberger,et al.  Improving LSA-based Summarization with Anaphora Resolution , 2005, HLT.

[7]  Susan T. Dumais,et al.  The latent semantic analysis theory of knowledge , 1997 .

[8]  Bernice W. Polemis Nonparametric Statistics for the Behavioral Sciences , 1959 .

[9]  Wai Lam,et al.  Evaluation Challenges in Large-Scale Document Summarization , 2003, ACL.

[10]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[11]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[12]  Stefan Riezler,et al.  Statistical Sentence Condensation using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar , 2003, NAACL.

[13]  Karel Jezek,et al.  Text Summarization and Singular Value Decomposition , 2004, ADVIS.

[14]  Mirella Lapata,et al.  Discourse Chunking and its Application to Sentence Compression , 2005, HLT.