Multi-document Summarization Using Support Vector Regression

Most multi-document summarization systems follow the extractive framework based on various features. While more and more sophisticated features are designed, the reasonable combination of features becomes a challenge. Usually the features are combined by a linear function whose weights are tuned manually. In this task, Support Vector Regression (SVR) model is used for automatically combining the features and scoring the sentences. Two important problems are inevitably involved. The first one is how to acquire the training data. Several automatic generation methods are introduced based on the standard reference summaries generated by human. Another indispensable problem in SVR application is feature selection, where various features will be picked out and combined into different feature sets to be tested. With the aid of DUC 2005 and 2006 data sets, comprehensive experiments are conducted with consideration of various SVR kernels and feature sets. Then the trained SVR model is used in the main task of DUC 2007 to get the extractive summaries.