Thomson Reuters at TAC 2008: Aggressive Filtering with FastSum for Update and Opinion Summarization

In TAC 2008 we participated in the main task (Update Summarization) as well as the Sentiment Summarization pilot task. We modified the FastSum system (Schilder and Kondadadi, 2008) and added more aggressive filtering in order to adapt the system to update summarization and sentiment summarization. For the Update Summarization task, we show that a classifier that identifies sentences that are similar to typical first sentences of a news article improves the overall linguistic quality of the generated summaries. For the Sentiment Summarization pilot task, we use a simple sentiment classifier based on a gazetteer of positive and negative sentiment words derived from the General Inquirer and other sources to produce opinion-based summaries for a collection of blog posts given a set of positive and negative questions.

[1]  Jack G. Conrad,et al.  Opinion mining in legal blogs , 2007, ICAIL.

[2]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[3]  Huan Liu,et al.  Blogosphere: research issues, tools, and applications , 2008, SKDD.

[4]  Hsin-Hsi Chen,et al.  Overview of Opinion Analysis Pilot Task at NTCIR-6 , 2007, NTCIR.

[5]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[6]  Claire Cardie,et al.  Combining Low-Level and Summary Representations of Opinions for Multi-Perspective Question Answering , 2003, New Directions in Question Answering.

[7]  Claire Grover,et al.  In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC , 2006 .

[8]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[9]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[10]  Jack G. Conrad,et al.  Professional credibility: authority on the web , 2008, WICOW '08.

[11]  Hsin-Hsi Chen,et al.  Opinion Analysis Across Languages: An Overview of and Observations from the NTCIR6 Opinion Analysis Pilot Task , 2007, WILF.

[12]  Frank Schilder,et al.  FastSum: Fast and Accurate Query-based Multi-document Summarization , 2008, ACL.

[13]  Danushka Bollegala,et al.  Measuring semantic similarity between words using web search engines , 2007, WWW '07.

[14]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[15]  Jun-ichi Fukumoto,et al.  Automated Summarization Evaluation with Basic Elements. , 2006, LREC.

[16]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[17]  Susan T. Dumais,et al.  What should blog search look like? , 2008, SSM '08.

[18]  Sujian Li,et al.  Multi-document Summarization Using Support Vector Regression , 2007 .

[19]  Swapna Somasundaran,et al.  QA with Attitude: Exploiting Opinion Type Analysis for Improving Question Answering in On-line Discussions and the News , 2007, ICWSM.

[20]  Hsin-Hsi Chen,et al.  Opinion Extraction, Summarization and Tracking in News and Blog Corpora , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.