Rhetorical browzing in journalistic texts: Preliminary investigations

The work presented in this paper concerns discourse structure analysis and its applications to intra- and inter-document search. In a typical application, which could be called “rhetorical browsing”, the system will provide assistance to a journal reader in order to focus on texts and passages presenting certain kind of information and comments, according to his/her current interest: may be raw information, possibly with chronological dimension, or on contrary analyses, recommendations, debates, etc.. The discourse model can be related to Swales's “discourse moves” and the derived “argumentative zoning” procedures for scientific documents. However due to the nature of the considered texts, zones are defined in more “generalist” terms, following the classic Narration-Description-Argumentation-Prescription typology and especially C. Smith's notion of “discourse modes”. The paper presents some preliminary steps performed in order to test the feasibility of the project. First of all, in order to ground our research on firm observations, we decided to build a corpus of journalistic texts, annotated according to the discourse model in view. Quantified results concerning the organization of discourse modes within texts could be obtained thanks to these annotations. In a second step, an experimental procedure for automatic tagging of text passages according to discourse modes has been designed, implemented and tested on the corpus.

[1]  Yves Bestgen,et al.  Validation d'une méthodologie pour l'étude des marqueurs de la segmentation dans un grand corpus de textes , 2006, Trait. Autom. des Langues.

[2]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[3]  Marc Moens,et al.  Discourse-level argumentation in scientific articles: human and automatic annotation , 1999 .

[4]  J. R. Landis,et al.  A one-way components of variance model for categorical data , 1977 .

[5]  L. Devilla «Analyse de Jean-Michel Adam, La linguistique textuelle. Introduction à l’analyse textuelle des discours, Paris, Armand Colin, 2005» , 2006 .

[6]  Nicholas Asher,et al.  Reference to abstract objects in discourse , 1993, Studies in linguistics and philosophy.

[7]  D. Biber,et al.  Book Review: Discourse on the Move: Using Corpus Analysis to Describe Discourse Structure by Douglas Biber, Ulla Connor, and Thomas A. Upton , 2007, CL.

[8]  Pascal Denis,et al.  Names and pops and discourse structure , 2006 .

[9]  Carlota Smith,et al.  Discourse modes: aspectual entities and tense interpretation , 2001 .

[10]  Patrice Enjalbert,et al.  Transitions thématiques : Annotation d'un corpus journalistique et premières analyses (Manual thematic annotation of a journalistic corpus : first observations and evaluation) [in French] , 2012, JEP-TALN-RECITAL.

[11]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[12]  Jean-Michel Adam,et al.  La linguistique textuelle : introduction à l'analyse textuelle des discours , 2005 .

[13]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[14]  Yann Mathet,et al.  La plate-forme Glozz : environnement d’annotation et d’exploration de corpus , 2009, JEPTALNRECITAL.

[15]  J. Adam La linguistique textuelle , 2011 .

[16]  J. Swales Research Genres: Explorations and Applications , 2004 .