Microsummarization of Online Reviews: An Experimental Study

Mobile and location-based social media applications provide platforms for users to share brief opinions about products, venues, and services. These quickly typed opinions, or micro-reviews, are a valuable source of current sentiment on a wide variety of subjects. However, there is currently little research on how to mine this information to present it back to users in easily consumable way. In this paper, we introduce the task of microsummarization, which combines sentiment analysis, summarization, and entity recognition in order to surface key content to users. We explore unsupervised and supervised methods for this task, and find we can reliably extract relevant entities and the sentiment targeted towards them using crowd-sourced labels as supervision. In an end-to-end evaluation, we find our best-performing system is vastly preferred by judges over a traditional extractive summarization approach. This work motivates an entirely new approach to summarization, incorporating both sentiment analysis and item extraction for modernized, at-a-glance presentation of public opinion.

[1]  Noémie Elhadad,et al.  An Unsupervised Aspect-Sentiment Model for Online Reviews , 2010, NAACL.

[2]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[3]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[4]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[5]  Valentin I. Spitkovsky,et al.  Unsupervised Dependency Parsing without Gold Part-of-Speech Tags , 2011, EMNLP.

[6]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[7]  Xavier Carreras,et al.  Simple Semi-supervised Dependency Parsing , 2008, ACL.

[8]  Panayiotis Tsaparas,et al.  Review Synthesis for Micro-Review Summarization , 2015, WSDM.

[9]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[10]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[12]  Regina Barzilay,et al.  Learning Document-Level Semantic Properties from Free-Text Annotations , 2008, ACL.

[13]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[14]  Dean P. Foster,et al.  Two Step CCA: A new spectral method for estimating vector models of words , 2012, ICML 2012.

[15]  Yang Liu,et al.  Why is “SXSW” trending? Exploring Multiple Text Sources for Twitter Topic Summarization , 2011 .

[16]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[17]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[18]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[19]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[20]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[21]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[22]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[23]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[24]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[25]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[26]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[27]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[28]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[29]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.