Connecting the dots between news articles

The process of extracting useful knowledge from large datasets has become one of the most pressing problems in today's society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this paper, we investigate methods for automatically connecting the dots -- providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate. We formalize the characteristics of a good chain and provide an efficient algorithm (with theoretical guarantees) to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithm's effectiveness in helping users understanding the news.

[1]  David D. Lewis,et al.  Threading Electronic Mail - A Preliminary Study , 1997, Inf. Process. Manag..

[2]  B. Ghiselin,et al.  The Creative Process , 2010 .

[3]  Susan T. Dumais,et al.  Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.

[4]  Chih-Ping Wei,et al.  Tracing the Event Evolution of Terror Attacks from On-Line News , 2006, ISI.

[5]  Jonathan P. Rowe,et al.  STORYEVAL: An Empirical Evaluation Framework for Narrative Generation , 2009, AAAI Spring Symposium: Intelligent Narrative Technologies II.

[6]  Stephanie W. Haas The Creative Process: A Computer Model of Storytelling and Creativity, by Scott R. Turner , 1996, J. Am. Soc. Inf. Sci..

[7]  Yiming Yang,et al.  Learning approaches for detecting and tracking news events , 1999, IEEE Intell. Syst..

[8]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[9]  Sameep Mehta,et al.  Towards Characterization of Actor Evolution and Interactions in News Corpora , 2008, ECIR.

[10]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[11]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[12]  Rina Dechter,et al.  Generalized best-first search strategies and the optimality of A* , 1985, JACM.

[13]  Yiming Yang,et al.  Improving text categorization methods for event tracking , 2000, SIGIR '00.

[14]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[15]  Dafna Shahaf,et al.  Turning down the noise in the blogosphere , 2009, KDD.

[16]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[17]  Robert Michael Young,et al.  A Computational Model of Inferencing in Narrative , 2009, AAAI Spring Symposium: Intelligent Narrative Technologies II.