Fine-grained Structure-based News Genre Categorization

Journalists usually organize and present the contents of a news article following a well-defined structure. In this work, we propose a new task to categorize news articles based on their content presentation structures, which is beneficial for various NLP applications. We first define a small set of news elements considering their functions (e.g., introducing the main story or event, catching the reader’s attention and providing details) in a news story and their writing style (narrative or expository), and then formally define four commonly used news article structures based on their selections and organizations of news elements. We create an annotated dataset for structure-based news genre identification, and finally, we build a predictive model to assess the feasibility of this classification task using structure indicative features.

[1]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[4]  Eisaku Maeda,et al.  Maximal Margin Labeling for Multi-Topic Text Categorization , 2004, NIPS.

[5]  S. Roumyana ASPECTUAL ENTITIES AND TENSE IN DISCOURSE , 2002 .

[6]  E. Ytreberg Moving out of the Inverted Pyramid: narratives and descriptions in television news , 2001 .

[7]  Galit Avneri,et al.  Style-based Text Categorization: What Newspaper Am I Reading? , 1998 .

[8]  W. Labov,et al.  Narrative analysis: Oral versions of personal experience. , 1997 .

[9]  Teun A. van Dijk,et al.  Discourse Analysis: Its Development and Application to the Structure of News , 1983 .

[10]  Salim Roukos,et al.  Story Segmentation and Topic Detection in the Broadcast News Domain , 1999 .

[11]  T. V. Dijk Discourse and communication : new approaches to the analysis of mass media discourse and communication , 1985 .

[12]  M. Bal,et al.  Narratology: Introduction to the Theory of Narrative , 1988 .

[13]  Larry Gillick,et al.  Text segmentation and topic tracking on broadcast news via a hidden Markov model approach , 1998, ICSLP.

[14]  Horst Po¨ttker News and its communicative quality: the inverted pyramid—when and why did it appear? , 2003 .

[15]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[16]  Teun A. Van Dijκ Structures of News in the Press , 1985 .

[17]  Inderjeet Mani,et al.  Computational Modeling of Narrative , 2013, Computational Modeling of Narrative.

[18]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[19]  Wenlin Yao,et al.  Temporal Event Knowledge Acquisition via Identifying Narratives , 2018, ACL.

[20]  Kan Li,et al.  Text Categorization Based on Topic Model , 2008, RSKT.

[21]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[22]  Efstathios Stamatatos,et al.  Automatic Text Categorization In Terms Of Genre and Author , 2000, CL.

[23]  B. Pentland Building Process Theory with Narrative: from Description to Explanation , 1999 .