“Making the News”: Identifying Noteworthy Events in News Articles

Most events described in a news article are background events ‐ only a small number are noteworthy, and a even smaller number serve as the trigger for writing of that article. Although these events are difficult to identify, they are crucial to NLP tasks such as first story detection, document summarization and event coreference, and to many applications of event analysis that depend on event counting and identifying trends. In this work, we introduce the notion of news-peg, a concept borrowed from the political science literature, in an attempt to remedy this problem. A news-peg is an event which prompted the author to write the article, and it serves as a more fine-grained measure of noteworthiness than a summary. We describe a new task of news-peg identification and release an annotated dataset for its evaluation. We formalize an operational definition of a news-peg, on which human annotators achieve high inter-annotator agreement (over 80%), and present a rule-based system for this task, which exploits syntactic features derived from established journalistic devices.

[1]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[2]  Michael Gamon,et al.  Identifying salient entities in web pages , 2013, CIKM.

[3]  Noah A. Smith,et al.  Semi-Supervised Frame-Semantic Parsing for Unknown Predicates , 2011, ACL.

[4]  Oren Etzioni,et al.  Identifying Relations for Open Information Extraction , 2011, EMNLP.

[5]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[6]  Nan Decker,et al.  The Use of Syntactic Clues in Discourse Processing , 1985, ACL.

[7]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[8]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[9]  Dan Roth,et al.  Labeling the Semantic Roles of Commas , 2016, AAAI.

[10]  Joel Nothman,et al.  Event Linking: Grounding Event Reference in a News Archive , 2012, ACL.

[11]  Dan Roth,et al.  A Joint Framework for Coreference Resolution and Mention Head Detection , 2015, CoNLL.

[12]  Dan Roth,et al.  Minimally Supervised Event Causality Identification , 2011, EMNLP.

[13]  Elahe Rahimtoroghi,et al.  Unsupervised Induction of Contingent Event Pairs from Film Scenes , 2013, EMNLP.

[14]  Mirella Lapata,et al.  Multiple Aspect Summarization Using Integer Linear Programming , 2012, EMNLP.

[15]  Miles Osborne,et al.  Using paraphrases for improving first story detection in news and Twitter , 2012, HLT-NAACL.

[16]  Jason Weston,et al.  Learning Anaphoricity and Antecedent Ranking Features for Coreference Resolution , 2015, ACL.

[17]  Christopher Potts,et al.  Modeling the Lifespan of Discourse Entities with Application to Coreference Resolution , 2015, J. Artif. Intell. Res..

[18]  S. Thompson,et al.  Transitivity in Grammar and Discourse , 1980 .

[19]  Claire Cardie,et al.  Identifying Anaphoric and Non-Anaphoric Noun Phrases to Improve Coreference Resolution , 2002, COLING.

[20]  Enrique Alfonseca,et al.  HEADY: News headline abstraction through event pattern clustering , 2013, ACL.

[21]  Dan Roth,et al.  Joint Inference for Event Timeline Construction , 2012, EMNLP.

[22]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[23]  Yue Zhang,et al.  Event-Driven Headline Generation , 2015, ACL.

[24]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[25]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[26]  Vincent Ng,et al.  Learning Noun Phrase Anaphoricity to Improve Conference Resolution: Issues in Representation and Optimization , 2004, ACL.

[27]  Jade Goldstein-Stewart,et al.  The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries , 1998, SIGIR Forum.

[28]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[29]  Miles Osborne,et al.  Streaming First Story Detection with application to Twitter , 2010, NAACL.