Combining semantics and social knowledge for news article summarization

With the diffusion of online newspapers and social media, users are becoming capable of retrieving dozens of news articles covering the same topic in a short time. News article summarization is the task of automatically selecting a worthwhile subset of news' sentences that users could easily explore. Promising research directions in this field are the use of semantics-based models (e.g., ontologies and taxonomies) to identify key document topics and the integration of social data analysis to also consider the current user's interests during summary generation. The chapter overviews the most recent research advances in document summarization and presents a novel strategy to combine ontology-based and social knowledge for addressing the problem of generic (not query-based) multi-document summarization of news articles. To identify the most salient news articles' sentences, an ontology-based text analysis is performed during the summarization process. Furthermore, the social content acquired from real Twitter messages is separately analyzed to also consider the current interests of social network users for sentence evaluation. The combination of ontological and social knowledge allows the generation of accurate and easy-to-read news summaries. Moreover, the proposed summarizer performs better than the evaluated competitors on real news articles and Twitter messages.

[1]  Ping Chen,et al.  A Query-Based Medical Information Summarization System Using Ontology Knowledge , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[2]  Sudipto Guha,et al.  Clustering data streams , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  Dongmei Ai,et al.  Automatic text summarization based on latent semantic indexing , 2010, Artificial Life and Robotics.

[4]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[5]  Chun Chen,et al.  Tag-oriented document summarization , 2009, WWW '09.

[6]  Tao Li,et al.  Document update summarization using incremental hierarchical clustering , 2010, CIKM.

[7]  Jia Wang,et al.  User comments for news recommendation in social media , 2010, SIGIR '10.

[8]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[9]  Xiaojun Wan,et al.  Improved Affinity Graph Based Multi-Document Summarization , 2006, NAACL.

[10]  Yang Liu,et al.  Why is “SXSW” trending? Exploring Multiple Text Sources for Twitter Topic Summarization , 2011 .

[11]  M. B. Chandak,et al.  Graph-Based Algorithms for Text Summarization , 2010, 2010 3rd International Conference on Emerging Trends in Engineering and Technology.

[12]  Tijl De Bie,et al.  An Information-Theoretic Approach to Finding Informative Noisy Tiles in Binary Databases , 2010, SDM.

[13]  Vasileios Hatzivassiloglou,et al.  A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[14]  Hiroya Takamura,et al.  Text summarization model based on the budgeted median problem , 2009, CIKM.

[15]  Rui Li,et al.  Exploring social tagging graph for web object classification , 2009, KDD.

[16]  Youli Qu,et al.  Summarization using Wikipedia , 2010, TAC.

[17]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[18]  Chunping Li,et al.  WikiSummarizer - A Wikipedia-based Summarization System , 2010, TAC.

[19]  Hannes Heikinheimo,et al.  Decomposable Families of Itemsets , 2008, ECML/PKDD.

[20]  Yihong Gong,et al.  Integrating Document Clustering and Multidocument Summarization , 2011, TKDD.

[21]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[22]  Jugal K. Kalita,et al.  Experiments in Microblog Summarization , 2010, 2010 IEEE Second International Conference on Social Computing.

[23]  A. Kogilavani,et al.  Ontology Enhanced Clustering Based Summarization of Medical Documents , 2009 .

[24]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[25]  Luca Cagliero,et al.  Multi-document summarization exploiting frequent itemsets , 2012, SAC '12.

[26]  John Atkinson,et al.  Rhetorics-based multi-document summarization , 2013, Expert Syst. Appl..

[27]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[28]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[29]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[30]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[31]  Fernando Pereira,et al.  Generating summary keywords for emails using topics , 2008, IUI '08.

[32]  Ladjel Bellatreche,et al.  Query Interaction Based Approach for Horizontal Data Partitioning , 2015, Int. J. Data Warehous. Min..

[33]  Vivi Nastase,et al.  Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation , 2008, EMNLP.

[34]  P. Viswanth Some Efficient and Fast Approaches to Document Clustering , 2009 .

[35]  Dunja Mladenic,et al.  Capturing Document Semantics for Ontology Generation and Document Summarization , 2009, Semantic Knowledge Management.

[36]  Szymon Jaroszewicz,et al.  Interestingness of frequent itemsets using Bayesian networks as background knowledge , 2004, KDD.

[37]  Hideaki Takeda,et al.  Ontology Extraction by Collaborative Tagging with Social Networking , 2008 .

[38]  Iraklis Varlamis,et al.  Mining Frequent Generalized Patterns for Web Personalization in the Presence of Taxonomies , 2010, Int. J. Data Warehous. Min..

[39]  Christopher Town,et al.  Ontological inference for image and video analysis , 2006, Machine Vision and Applications.

[40]  Nikolaj Tatti,et al.  Using background knowledge to rank itemsets , 2010, Data Mining and Knowledge Discovery.

[41]  Tao Li,et al.  Ontology-enriched multi-document summarization in disaster management , 2010, SIGIR.

[42]  Hiroya Takamura,et al.  Text Summarization Model Based on Maximum Coverage Problem and its Variant , 2009, EACL.

[43]  Jugal K. Kalita,et al.  Summarizing Microblogs Automatically , 2010, NAACL.

[44]  Jack G. Conrad,et al.  Query-based opinion summarization for legal blog entries , 2009, ICAIL.

[45]  Ferda Nur Alpaslan,et al.  Text Summarization of Turkish Texts using Latent Semantic Analysis , 2010, COLING.

[46]  Reda Alhajj,et al.  Text summarization techniques: SVM versus neural networks , 2009, iiWAS.

[47]  Shafiq R. Joty,et al.  A SVM-Based Ensemble Approach to Multi-Document Summarization , 2009, Canadian Conference on AI.

[48]  Robert Wetzker,et al.  An Ontology-Based Approach to Text Summarization , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[49]  Joel Nothman,et al.  Evaluating Entity Linking with Wikipedia , 2013, Artif. Intell..

[50]  Jilles Vreeken,et al.  Tell me what i need to know: succinctly summarizing data with itemsets , 2011, KDD.

[51]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[52]  Giuseppe Carenini,et al.  Summarizing email conversations with clue words , 2007, WWW '07.

[53]  Juan-Zi Li,et al.  Social context summarization , 2011, SIGIR.

[54]  M. Saravanan,et al.  Identification of Rhetorical Roles for Segmentation and Summarization of a Legal Judgment , 2010, Artificial Intelligence and Law.

[55]  Jie Tang,et al.  Multi-topic Based Query-Oriented Summarization , 2009, SDM.

[56]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[57]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[58]  Wenjie Li,et al.  Automatic Twitter Topic Summarization With Speech Acts , 2013, IEEE Transactions on Audio, Speech, and Language Processing.