Finding New Information Via Robust Entity Detection

Journalists and editors work under pressure to collect relevant details and background information about specific events. They spend a significant amount of time sifting through documents and finding new information such as facts, opinions or stakeholders (i.e. people, places and organizations that have a stake in the news). Spotting them is a tedious and cognitively intense process. One task, essential to this process, is to find and keep track of stakeholders. This task is taxing cognitively and in terms of memory. Tell Me More offers an automatic aid to this task. Tell Me More is a system that, given a seed story, mines the web for similar stories reported by different sources and selects only those stories which offer new information with respect to that original seed story. Much like a journalist, the task of detecting named entities is central to its success. In this paper we briefly describe Tell Me More and, in particular, we focus on Tell Me More’s entity detection component. We describe an approach that combines off-the-shelf named entity recognizers (NERs) with WPED, a publicly available NER that uses Wikipedia as its knowledge base. We show significant increase in precision scores with respect to traditional NERs. Lastly, we present an overall evaluation of Tell Me More using this approach.

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  Donna K. Harman,et al.  Overview of the TREC 2003 Novelty Track , 2003, TREC.

[3]  Karl Aberer,et al.  Towards better entity resolution techniques for Web document collections , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[4]  Doug Downey,et al.  Locating Complex Named Entities in Web Text , 2007, IJCAI.

[5]  Francisco Iacobelli,et al.  Tell me more, not just "more of the same" , 2010, IUI '10.

[6]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[7]  Lawrence Birnbaum,et al.  Information access in context , 2001, Knowl. Based Syst..

[8]  Susan T. Dumais,et al.  Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.

[9]  Jiahui Liu,et al.  Rich interfaces for reading news on the web , 2009, IUI.

[10]  Wei Li,et al.  Semi-Supervised Sequence Modeling with Syntactic Topic Models , 2005, AAAI.

[11]  Mika Käki,et al.  Findex: search result categories help users when document ranking fails , 2005, CHI.

[12]  Sang Jeong Lee,et al.  Aspect-level news browsing: understanding news events from multiple viewpoints , 2010, IUI '10.

[13]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[14]  Fabio Crestani,et al.  'Show me more': Incremental length summarisation using novelty detection , 2008, Inf. Process. Manag..

[15]  Edward Fredkin,et al.  Trie memory , 1960, Commun. ACM.

[16]  Kathleen R. McKeown,et al.  Learning to identify new information , 2005 .

[17]  B. Underwood,et al.  Proactive inhibition in short-term retention of single items , 1962 .

[18]  Kathleen McKeown,et al.  Columbia University in the Novelty Track at TREC 2004 , 2004, TREC.

[19]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.