论文信息 - Identifying and expanding titles in web texts

Identifying and expanding titles in web texts

In this paper, we present an analysis based on linguistic and typographic features that allows for the identification of titles in web documents. We focus in particular on procedural texts. Identifying titles is a difficult task because ways of encoding them are very diverse. A number of titles are also incomplete because of context, we propose therefore a way to retrieve the missing elements, in particular predicates, so that titles are fully intelligible.

Patrick Saint-Dizier | Estelle Delpech | Clémentine Adam

[1] Patrick Saint-Dizier,et al. Investigating the Structure of Procedural Texts for Answering How-to Questions , 2008, LREC.

[2] Jason Eisner,et al. Lexical Semantics , 2020, The Handbook of English Linguistics.

[3] Marie-Paule Jacques. Approche en discours de la réduction des termes complexes dans les textes spécialisés , 2003 .

[4] E. Engdahl,et al. The linguistic realization of information packaging , 2013 .

[5] Guy Lapalme,et al. Choosing Rhetorical Structures To Plan Instructional Texts , 2000, Comput. Intell..

[6] Marti A. Hearst. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.