论文信息 - Gist: general integrated summarization of text and reviews

Gist: general integrated summarization of text and reviews

E-commerce is rapidly growing, with review Web sites hosting hundreds of reviews on average for any product. Reading so many reviews is tedious, time-consuming, and with the proposed Gist, unnecessary. We introduce Gist, a system to automatically summarize large amounts of text into informative and actionable key sentences. With unsupervised learning and sentiment analysis, Gist selects the sentences that best characterize a set of reviews. All of this is done in seconds, without prior adjustment or training. Gist extends the current state of the art with a modular system that can take advantage of a priori knowledge and adapt to new domains through easy modification and extension. Gist is a general framework, able to summarize any set of text and easily adapt to specific domains. A robust comparison with state-of-the-art summarization algorithms, on datasets containing hundreds of documents, proves Gist’s ability to effectively summarize text and reviews.

Iren Valova | Justin Lovinger | Chad Clough

[1] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[2] Dragomir R. Radev,et al. LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[3] Harun Uguz,et al. A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[4] Gudmund R. Iversen,et al. Analysis of Variance , 2011, International Encyclopedia of Statistical Science.

[5] Inderjeet Mani,et al. The Challenges of Automatic Summarization , 2000, Computer.

[6] Kalyanmoy Deb,et al. A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[7] Nina Wacholder,et al. Identifying Sarcasm in Twitter: A Closer Look , 2011, ACL.

[8] Bo Pang,et al. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[9] Jaime Carbonell,et al. Multi-Document Summarization By Sentence Extraction , 2000 .

[10] Geoffrey Zweig,et al. Summarization of Multiple User Reviews in the Restaurant Domain , 2007 .

[11] R. Blair,et al. A more realistic look at the robustness and Type II error properties of the t test to departures from population normality. , 1992 .

[12] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[13] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[14] Andrew Zisserman,et al. Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[15] James Kennedy,et al. Particle swarm optimization , 2002, Proceedings of ICNN'95 - International Conference on Neural Networks.

[16] Giuseppe Carenini,et al. Abstractive Summarization of Product Reviews Using Discourse Structure , 2014, EMNLP.

[17] Razvan C. Bunescu,et al. Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[18] Horacio Saggion. A Robust and Adaptable Summarization Tool , 2008 .

[19] Lionel C. Briand,et al. A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[20] J. Rice. Mathematical Statistics and Data Analysis , 1988 .

[21] David D. Lewis,et al. Reuters-21578 Text Categorization Test Collection, Distribution 1.0 , 1997 .

[22] Freddy Chong Tat Chua,et al. Automatic Summarization of Events from Social Media , 2013, ICWSM.

[23] Ani Nenkova,et al. A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[24] Horacio Saggion. Creating Summarization Systems with SUMMA , 2014, LREC.

[25] Wai Lam,et al. MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[26] Halil Kilicoglu,et al. Abstraction Summarization for Managing the Biomedical Research Literature , 2004, HLT-NAACL 2004.

[27] Hao Yu,et al. Structure-Aware Review Mining and Summarization , 2010, COLING.

[28] Thierry Poibeau,et al. Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[29] Kalina Bontcheva,et al. Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[30] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[31] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval , 1972 .

[32] Gail C. Murphy,et al. Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.

[33] Hans Peter Luhn,et al. A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[34] Fabrizio Sebastiani,et al. An Analysis of the Relative Hardness of Reuters-21578 Subsets , 2003 .

[35] Jackie Chi Kit Cheung,et al. Multi-Document Summarization of Evaluative Text , 2013, EACL.

[36] Andrew Zisserman,et al. Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Lillian Lee,et al. Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[38] Hossein Nezamabadi-pour,et al. GSA: A Gravitational Search Algorithm , 2009, Inf. Sci..

[39] Jiawei Han,et al. Opinosis: A Graph Based Approach to Abstractive Summarization of Highly Redundant Opinions , 2010, COLING.

[40] Janyce Wiebe,et al. Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[41] Bing Liu,et al. Mining and summarizing customer reviews , 2004, KDD.