Incident threading for news passages

With an overwhelming volume of news reports currently available, there is an increasing need for automatic techniques to analyze and present news to a general reader in a meaningful and efficient manner. We explore incident threading as a possible solution to this problem. All text that describes the occurrence of a real-world happening is merged into a news incident, and incidents are organized in a network with dependencies of predefined types. Earlier attempts at this problem have assumed that a news story covers a single topic. We move beyond that limitation to introduce passage threading, which processes news at the passage level. First we develop a new testbed for this research and extend the evaluation methods to address new granularity issues. Then a three-stage algorithm is described that identifies on-subject passages, groups them into incidents, and establishes links between related incidents. Finally, we observe significant improvement over earlier work when we optimize the harmonic mean of the appropriate evaluation measures. The resulting performance exceeds the level that a calibration study shows is necessary to support a reading comprehension task.

[1]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[2]  James Allan,et al.  Finding and linking incidents in news , 2007, CIKM '07.

[3]  Daniel E. O'Leary,et al.  The Internet, Intranets, and the AI Renaissance , 1997, Computer.

[4]  Teun A. van Dijk,et al.  Discourse Analysis: Its Development and Application to the Structure of News , 1983 .

[5]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[6]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[7]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[8]  Marti A. Hearst Text tiling: A quantitative approach to discourse segmentation , 1993, ACL 1993.

[9]  S. Reeves,et al.  Discourse Analysis , 2018, Understanding Communication Research Methods.

[10]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  D. K. Davis News as Discourse , 1989 .

[14]  T. V. Dijk News as Discourse , 1990 .

[15]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[16]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[17]  C. Welin Scripts, plans, goals and understanding, an inquiry into human knowledge structures: Roger C. Schank and Robert P. Abelson Hillsdale: Lawrence Erlbaum Associates, 1977. 248 pp. £ 10.60 hardcover , 1979 .

[18]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[19]  John D. Lafferty,et al.  Text Segmentation Using Exponential Models , 1997, EMNLP.

[20]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[21]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .