Micropinion generation: an unsupervised approach to generating ultra-concise summaries of opinions

This paper presents a new unsupervised approach to generating ultra-concise summaries of opinions. We formulate the problem of generating such a micropinion summary as an optimization problem, where we seek a set of concise and non-redundant phrases that are readable and represent key opinions in text. We measure representativeness based on a modified mutual information function and model readability with an n-gram language model. We propose some heuristic algorithms to efficiently solve this optimization problem. Evaluation results show that our unsupervised approach outperforms other state of the art summarization methods and the generated summaries are informative and readable.

[1]  R. Real,et al.  The Probabilistic Basis of Jaccard's Index of Similarity , 1996 .

[2]  Carl Gutwin,et al.  Domain-Specific Keyphrase Extraction , 1999, IJCAI.

[3]  Carl Gutwin,et al.  KEA: practical automatic keyphrase extraction , 1999, DL '99.

[4]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies , 2000, ArXiv.

[5]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[6]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[7]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[8]  Charles L. A. Clarke,et al.  Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[9]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[10]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[12]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[13]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[14]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[15]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[16]  Jackie Chi Kit Cheung,et al.  Multi-Document Summarization of Evaluative Text , 2013, EACL.

[17]  Regina Barzilay,et al.  Multiple Aspect Ranking Using the Good Grief Algorithm , 2007, NAACL.

[18]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[19]  Jackie Chi Kit Cheung,et al.  Extractive vs. NLG-based Abstractive Summarization of Evaluative Text: The Effect of Corpus Controversiality , 2008, INLG.

[20]  Regina Barzilay,et al.  Learning Document-Level Semantic Properties from Free-Text Annotations , 2008, ACL.

[21]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[22]  Ian H. Witten,et al.  Domain-independent automatic keyphrase indexing with small training sets , 2008, J. Assoc. Inf. Sci. Technol..

[23]  Yue Lu,et al.  Rated aspect summarization of short comments , 2009, WWW '09.

[24]  ChengXiang Zhai,et al.  Generating comparative summaries of contradictory opinions in text , 2009, CIKM.

[25]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[26]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[27]  Michael J. Paul,et al.  Summarizing Contrastive Viewpoints in Opinionated Text , 2010, EMNLP.

[28]  Yue Lu,et al.  Latent aspect rating analysis on review text data: a rating regression approach , 2010, KDD.

[29]  Patrick Haffner,et al.  Capturing the Stars: Predicting Ratings for Service and Product Reviews , 2010, HLT-NAACL 2010.

[30]  Xiaolong Li,et al.  An Overview of Microsoft Web N-gram Corpus and Applications , 2010, NAACL.