Headline extraction based on a combination of uni- and multidocument summarization techniques

The TNO system for multi-document summarisation is based on an extraction approach. For headline generation, we chose to extend our system to extract the most informative topical noun phrase. The cluster topic is defined as the most frequent term occurring in the most salient document sentences. The core of our system is a probabilistic model, which estimates the log-odds of salience based on a number of features including sentence position, sentence length, cue phrases and a language model based content score. The parameters of the model were estimated on annotated training data.