Vers une recherche automatique des marqueurs de la segmentation du discours

To study linguistic expressions signalling thematic breaks in large text corpora automatic procedures for the identification of these breaks are indispensable. In the present study, we test the effectiveness of four indices of cohesion whose calculation can be automated. We show that these indices make it possible to differentiate between three categories of temporal segmentation markers. Resume Afin de pouvoir etudier dans de grands corpus de textes le fonctionnement d’expressions linguistiques qui signalent les ruptures thematiques, il est indispensable de disposer de procedures automatiques capables d’identifier ces ruptures. Dans la presente etude, nous testons l’efficacite de quatre indices de cohesion dont le calcul peut etre automatise. Nous montrons que ces indices permettent de differencier trois categories de marqueurs temporels de la segmentation.

[1]  Heather A. Stark What do paragraph markings do , 1988 .

[2]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[3]  Yves Bestgen,et al.  On the use of automatic techniques to determine the semantics of connectives in large newspaper corpora: an explorative study , 2003 .

[4]  Yves Bestgen,et al.  Analyse sémantique latente et segmentation automatique des textes , 2004 .

[5]  E. M. Segal,et al.  The role of interclausal connectives in narrative structuring: Evidence from adults' interpretations of simple stories , 1991 .

[6]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[7]  Yves Bestgen,et al.  The Role of Temporal Markers in the Segmentation of Narrative Discourse , 1991 .

[8]  T. V. Dijk,et al.  EPISODES AS UNITS OF DISCOURSE ANALYSIS , 2006 .

[9]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[10]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[11]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[12]  Rolf A. Zwaan Processing Narrative Time Shifts , 1996 .

[13]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[14]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[15]  Yves Bestgen,et al.  Temporal adverbials as segmentation markers in discourse comprehension , 2004 .

[16]  Robert E. Longacre,et al.  The Paragraph as a Grammatical Unit , 1979 .