Exploring Significant Interactions in Live News

News monitoring is of interest to detect current news and track developing stories, but also to explore what is being talked about. In this article, we present an approach to monitoring live feeds of news articles and detecting significant (co-)occurrences of terms compared to a learning background corpus. We visualize the result as a graph-structured semantic word cloud that uses a stochastic neighbor embedding (SNE) based layout and visualizes edges between related terms. We give visual examples of our prototype that processes news as they are crawled from dozens of news sites.

[1]  Hans-Peter Kriegel,et al.  SigniTrend: scalable detection of emerging topics in textual streams by hashed significance thresholds , 2014, KDD.

[2]  Hans-Peter Kriegel,et al.  Scalable Detection of Emerging Topics and Geo-spatial Events in Large Textual Streams , 2016 .

[3]  Andreas Spitz,et al.  Terms over LOAD: Leveraging Named Entities for Cross-Document Extraction and Summarization of Events , 2016, SIGIR.

[4]  M-Dyaa Albakour,et al.  On the Long-Tail Entities in News , 2017, ECIR.

[5]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[6]  Peter J. Rousseeuw,et al.  Agglomerative Nesting (Program AGNES) , 2008 .

[7]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[8]  Hansjörg Schmauder,et al.  Visual analysis of microblog content using time-varying co-occurrence highlighting in tag clouds , 2012, AVI.

[9]  Daniel A. Keim,et al.  Processing online news streams for large-scale semantic analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[10]  Hans-Peter Kriegel,et al.  DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..

[11]  A. Zimek,et al.  On Using Class-Labels in Evaluation of Clusterings , 2010 .

[12]  Stephen G. Kobourov,et al.  Experimental Comparison of Semantic Word Clouds , 2014, SEA.

[13]  Johanna Geiß,et al.  Semantic Word Clouds with Background Corpus Normalization and t-distributed Stochastic Neighbor Embedding , 2017, ArXiv.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  Arthur Zimek,et al.  A Framework for Clustering Uncertain Data , 2015, Proc. VLDB Endow..

[16]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[17]  M. de Rijke,et al.  Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams , 2014, ECIR.

[18]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[19]  Erik Van der Goot,et al.  Near real time information mining in multilingual news , 2009, WWW '09.

[20]  Andreas Spitz,et al.  EVELIN: Exploration of Event and Entity Links in Implicit Networks , 2017, WWW.

[21]  Hanan Samet,et al.  NewsStand: a new view on news , 2008, GIS '08.

[22]  James Allan,et al.  Finding and linking incidents in news , 2007, CIKM '07.

[23]  M. Sheelagh T. Carpendale,et al.  SparkClouds: Visualizing Trends in Tag Clouds , 2010, IEEE Transactions on Visualization and Computer Graphics.