Rookie: A unique approach for exploring news archives

News archives are an invaluable primary source for placing current events in historical context. But current search engine tools do a poor job at uncovering broad themes and narratives across documents. We present Rookie: a practical software system which uses natural language processing (NLP) to help readers, reporters and editors uncover broad stories in news archives. Unlike prior work, Rookie's design emerged from 18 months of iterative development in consultation with editors and computational journalists. This process lead to a dramatically different approach from previous academic systems with similar goals. Our efforts offer a generalizable case study for others building real-world journalism software using NLP.

[1]  Jonathan Stray What do Journalists do with Documents ? Field Notes for Natural Language Processing Researchers , 2016 .

[2]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[3]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[4]  Elena Mozzherina,et al.  An Approach to Improving the Classification of the New York Times Annotated Corpus , 2013, KESW.

[5]  Brendan T. O'Connor,et al.  Improving Entity Ranking for Keyword Queries , 2016, CIKM.

[6]  Jeffrey Heer,et al.  The Effects of Interactive Latency on Exploratory Visual Analysis , 2014, IEEE Transactions on Visualization and Computer Graphics.

[7]  Mor Naaman,et al.  Diamonds in the rough: Social media visual analytics for journalistic inquiry , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[8]  Marti A. Hearst,et al.  WordSeer: Exploring Language Use in Literary Text , 2011 .

[9]  Ben Shneiderman,et al.  Interactive Dynamics for Visual Analysis , 2012 .

[10]  John T. Stasko,et al.  Reflections on the evolution of the Jigsaw visual analytics system , 2014, Inf. Vis..

[11]  John T. Stasko,et al.  Visual Analytics Support for Intelligence Analysis , 2013, Computer.

[12]  Jeffrey Heer,et al.  Replication of the Keyword Extraction part of the paper "'Without the Clutter of Unimportant Words': Descriptive Keyphrases for Text Visualization" , 2019, ArXiv.

[13]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Gary Marchionini,et al.  Evaluating exploratory search systems: Introduction to special topic issue of information processing and management , 2008, Inf. Process. Manag..

[16]  Ben Shneiderman,et al.  Dynamic Query Tools for Time Series Data Sets: Timebox Widgets for Interactive Exploration , 2004, Inf. Vis..

[17]  Gary Marchionini,et al.  Exploratory search , 2006, Commun. ACM.

[18]  Xin Tong,et al.  TextFlow: Towards Better Understanding of Evolving Topics in Text , 2011, IEEE Transactions on Visualization and Computer Graphics.

[19]  M. Sheelagh T. Carpendale,et al.  VisGets: Coordinated Visualizations for Web-based Information Exploration and Discovery , 2008, IEEE Transactions on Visualization and Computer Graphics.

[20]  Brendan T. O'Connor,et al.  Posterior calibration and exploratory analysis for natural language processing models , 2015, EMNLP.

[21]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[22]  Dan Klein,et al.  A Joint Model for Entity Analysis: Coreference, Typing, and Linking , 2014, TACL.

[23]  Mudita Singhal,et al.  Footprints: A Visual Search Tool that Supports Discovery and Coverage Tracking , 2014, IEEE Transactions on Visualization and Computer Graphics.

[24]  William Ribarsky,et al.  ParallelTopics: A probabilistic approach to exploring document collections , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[25]  William Ribarsky,et al.  HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies , 2013, IEEE Transactions on Visualization and Computer Graphics.

[26]  Tamara Munzner,et al.  Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool for Investigative Journalists , 2014, IEEE Transactions on Visualization and Computer Graphics.

[27]  Marti A. Hearst,et al.  Scatter/gather browsing communicates the topic structure of a very large text collection , 1996, CHI.

[28]  Daniel Marcu,et al.  Bayesian Query-Focused Summarization , 2006, ACL.

[29]  Brendan T. O'Connor,et al.  MiTextExplorer: Linked brushing and mutual information for exploratory text data analysis , 2014 .

[30]  Marti A. Hearst Search User Interfaces , 2009 .

[31]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[32]  Jeffrey Heer,et al.  Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment , 2013, ICML.

[33]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[34]  Jarke J. van Wijk,et al.  Small Multiples, Large Singles: A New Approach for Visual Data Exploration , 2013, Comput. Graph. Forum.

[35]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[36]  Dan Klein,et al.  Capturing Semantic Similarity for Entity Linking with Convolutional Neural Networks , 2016, NAACL.

[37]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[38]  Ryen W. White,et al.  Exploratory Search: Beyond the Query-Response Paradigm , 2009, Exploratory Search: Beyond the Query-Response Paradigm.

[39]  Marti A. Hearst,et al.  Automating Creation of Hierarchical Faceted Metadata Structures , 2007, NAACL.

[40]  Dafna Shahaf,et al.  Trains of thought: generating information maps , 2012, WWW.

[41]  Jakob Nielsen,et al.  Usability engineering , 1997, The Computer Science and Engineering Handbook.

[42]  Yujie Liu,et al.  Newdle: Interactive Visual Exploration of Large Online News Collections , 2010, IEEE Computer Graphics and Applications.

[43]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[44]  Andreas Buja,et al.  Interactive data visualization using focusing and linking , 1991, Proceeding Visualization '91.

[45]  Dafna Shahaf,et al.  Information cartography: creating zoomable, large-scale maps of information , 2013, KDD.