What is this text about?

Most work in text retrieval aims at presenting the information held by several texts in order to give entry clues towards these texts and to allow a navigation between them. Besides, a lesser interest is dedicated to the definition of principles for accessing content of single documents. As most information retrieval systems return documents from an initial request made of words, a usual solution consists of presenting document titles and highlighting words of the request inside a passage or in the whole document. Such a presentation does not allow a rapid reading and systems cannot satisfy themselves with it. Our studies lead us to provide indicative and informative view of texts as in summarization systems. We offer the user different levels of abstraction of a text: the first is a global overview, where global topics are indicated and positioned in the text. The second level of abstraction goes deeper in the topic description by adding local topics and information about the argumentative role of the segments. In this paper, we will detail the extraction of thematic descriptors and meta-descriptors that relies on recurrence -respectively in a text or in the corpus- and how their characterization provides the segment structuring.

[1]  J. Minel,et al.  Résumé automatique par filtrage sémantique d'informations dans des textes , 2001 .

[2]  Marie-Francine Moens,et al.  Generic topic segmentation of document texts , 2001, SIGIR '01.

[3]  Marti A. Hearst Improving Full-Text Precision on Short Queries using Simple Constraints , 1996 .

[4]  Horacio Saggion,et al.  Concept Identification and Presentation in the Context of Technical Text Summarization , 2000 .

[5]  Michael Halliday,et al.  Cohesion in English , 1976 .

[6]  William C. Mann,et al.  RHETORICAL STRUCTURE THEORY: A THEORY OF TEXT ORGANIZATION , 1987 .

[8]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[9]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[10]  W. Bruce Croft,et al.  Discovering and Comparing Topic Hierarchies , 2000, RIAO.

[11]  Olivier Ferret,et al.  Segmenter et structurer thématiquement des textes par l’utilisation conjointe de collocations et de la récurrence lexicale , 2002, JEPTALNRECITAL.

[12]  Gerard Salton,et al.  Automatic text decomposition using text segments and text themes , 1996, HYPERTEXT '96.

[13]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..

[14]  Jean-Luc Minel,et al.  Repérage de structures thématiques dans des textes , 2001, JEPTALNRECITAL.

[15]  Christian Jacquemin,et al.  Multi-dimensional and Multi-scale Visualizer of Large XML Documents , 2002, Eurographics.

[16]  Yaakov Yaari,et al.  Segmentation of Expository Texts by Hierarchical Agglomerative Clustering , 1997, ArXiv.

[17]  James J. Thomas,et al.  Visualizing the non-visual: spatial analysis and interaction with information from text documents , 1995, Proceedings of Visualization 1995 Conference.

[18]  Nicolas Masson Methodes pour une generation variable de resume automatique : vers un systeme de reduction de texte , 1998 .

[19]  Horacio Saggion,et al.  Selective analysis for automatic abstracting: Evaluating Indicativeness and Acceptability , 2000, RIAO.

[20]  Ruslan Mitkov,et al.  Robust Pronoun Resolution with Limited Knowledge , 1998, ACL.

[21]  Kathleen R. McKeown,et al.  Linear segmentation and segment relevence , 1998 .

[22]  Marc Moens,et al.  Argumentative Classification of Extracted Sentences as a First Step Towards Flexible Abstracting , 1999 .

[23]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[24]  Hsinchun Chen,et al.  Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques , 1998, J. Am. Soc. Inf. Sci..