SocialTree: Socially Augmented Structured Summaries of News Stories

News story understanding entails having an effective summary of a related group of articles that may span different time ranges, involve different topics and entities, and have connections to other stories. In this work, we present an approach to efficiently extract structured summaries of news stories by augmenting news media with the structure of social discourse as reflected in social media in the form of social tags. Existing event detection, topic-modeling, clustering and summarization methods yield news story summaries based only on noun phrases and named entities. These representations are sensitive to the article wording and the keyword extraction algorithm. Moreover, keyword-based representations are rarely helpful for highlighting the inter-story connections or for reflecting the inner structure of the news story because of high word ambiguity and clutter from the large variety of keywords describing news stories. Our method combines the news and social media domains to create structured summaries of news stories in the form of hierarchies of keywords and social tags, named SocialTree. We show that the properties of social tags can be exploited to augment the construction of hierarchical summaries of news stories and to alleviate the weaknesses of existing keyword-based representations. In our quantitative and qualitative evaluation the proposed method strongly outperforms the state-of-the-art with regard to both coverage and informativeness of the summaries.

[1]  Alexander J. Smola,et al.  Online Inference for the Infinite Topic-Cluster Model: Storylines from Streaming Text , 2011, AISTATS.

[2]  Ronaldo Florence,et al.  Constrained Hierarchical Clustering for News Events , 2017, IDEAS.

[3]  Xin-Yu Dai,et al.  Unsupervised Storyline Extraction from News Articles , 2016, IJCAI.

[4]  Andrew Kehoe,et al.  Social tagging: A new perspective on textual ‘aboutness’ , 2011 .

[5]  Yueting Zhuang,et al.  Sketch the Storyline with CHARCOAL: A Non-Parametric Approach , 2015, IJCAI.

[6]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[7]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8]  Peng Zhang,et al.  NewsMiner: Multifaceted news analysis for event search , 2015, Knowl. Based Syst..

[9]  Yu Xu,et al.  Growing Story Forest Online from Massive Breaking News , 2017, CIKM.

[10]  Louiqa Raschid,et al.  A Graph Analytical Approach for Topic Detection , 2013, TOIT.

[11]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[12]  Xuchao Zhang,et al.  Automatical Storyline Generation with Help from Twitter , 2016, CIKM.

[13]  Evangelos Kanoulas,et al.  Modeling the Score Distributions of Relevant and Non-relevant Documents , 2009, ICTIR.

[14]  Neil J. Hurley,et al.  Learning-to-Rank for Real-Time High-Precision Hashtag Recommendation for Streaming News , 2016, WWW.

[15]  Dafna Shahaf,et al.  Trains of thought: generating information maps , 2012, WWW.

[16]  Evangelos Kanoulas,et al.  Score distribution models: assumptions, intuition, and robustness to score manipulation , 2010, SIGIR.

[17]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[18]  Marko Grobelnik,et al.  Using News Articles for Real-time Cross-Lingual Event Detection and Filtering , 2016, NewsIR@ECIR.

[19]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .