Gist: general integrated summarization of text and reviews

E-commerce is rapidly growing, with review Web sites hosting hundreds of reviews on average for any product. Reading so many reviews is tedious, time-consuming, and with the proposed Gist, unnecessary. We introduce Gist, a system to automatically summarize large amounts of text into informative and actionable key sentences. With unsupervised learning and sentiment analysis, Gist selects the sentences that best characterize a set of reviews. All of this is done in seconds, without prior adjustment or training. Gist extends the current state of the art with a modular system that can take advantage of a priori knowledge and adapt to new domains through easy modification and extension. Gist is a general framework, able to summarize any set of text and easily adapt to specific domains. A robust comparison with state-of-the-art summarization algorithms, on datasets containing hundreds of documents, proves Gist’s ability to effectively summarize text and reviews.

[1]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[2]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[3]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[4]  Gudmund R. Iversen,et al.  Analysis of Variance , 2011, International Encyclopedia of Statistical Science.

[5]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[6]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[7]  Nina Wacholder,et al.  Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[8]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[9]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[10]  Geoffrey Zweig,et al.  Summarization of Multiple User Reviews in the Restaurant Domain , 2007 .

[11]  R. Blair,et al.  A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. , 1992 .

[12]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[13]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[14]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[15]  James Kennedy,et al.  Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[16]  Giuseppe Carenini,et al.  Abstractive Summarization of Product Reviews Using Discourse Structure , 2014, EMNLP.

[17]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[18]  Horacio Saggion A Robust and Adaptable Summarization Tool , 2008 .

[19]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[20]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[21]  David D. Lewis,et al.  Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[22]  Freddy Chong Tat Chua,et al.  Automatic Summarization of Events from Social Media , 2013, ICWSM.

[23]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[24]  Horacio Saggion Creating Summarization Systems with SUMMA , 2014, LREC.

[25]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[26]  Halil Kilicoglu,et al.  Abstraction Summarization for Managing the Biomedical Research Literature , 2004, HLT-NAACL 2004.

[27]  Hao Yu,et al.  Structure-Aware Review Mining and Summarization , 2010, COLING.

[28]  Thierry Poibeau,et al.  Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[29]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[30]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[31]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[32]  Gail C. Murphy,et al.  Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.

[33]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[34]  Fabrizio Sebastiani,et al.  An Analysis of the Relative Hardness of Reuters-21578 Subsets , 2003 .

[35]  Jackie Chi Kit Cheung,et al.  Multi-Document Summarization of Evaluative Text , 2013, EACL.

[36]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[38]  Hossein Nezamabadi-pour,et al.  GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[39]  Jiawei Han,et al.  Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[40]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[41]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.