Classification et catégorisation automatiques : application à l'analyse thématique des données textuelles

Since a few years, several literature and humanities research projects have tried to integrate automatic data-processing dimensions into their objectives. In spite of each project’s specificities, most of the projects’ objectives concern the comprehension and the automatic processing of thematic analysis of textual data. In this paper, we present a data processing sequence adapted to thematic analysis of textual data. The specificity of this data processing sequence lies in its use of data classification and automatic categorization techniques. We present results of an experiment on a philosophical corpus.

[1]  W. Kintsch,et al.  Strategies of discourse comprehension , 1983 .

[2]  S. Hockey Electronic Texts in the Humanities , 2000 .

[3]  Bo Pettersson 12. Seven trends in recent thematics and a case study , 2002 .

[4]  Cornelia Züll,et al.  A review of software for text analysis , 1999 .

[5]  V. Dijk,et al.  Some Aspects Of Text Grammars , 1972 .

[6]  Geneviève Rodis-Lewis,et al.  Descartes et le rationalisme , 1985 .

[7]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[8]  T. Pavel,et al.  Thematics: New Approaches , 1995 .

[9]  Étienne Brunet,et al.  La thématique. Essai de repérage automatique dans l'oeuvre d'un écrivain (Le Clézio) , 2000 .

[10]  Walter Kintsch,et al.  Toward a model of text comprehension and production. , 1978 .

[11]  Cornelia Züll,et al.  Commonalities, differences and limitations of text analysis software: the results of a review , 1999 .

[12]  C. Barry Choosing Qualitative Data Analysis Software: Atlas/ti and Nudist Compared , 1998 .

[13]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[14]  Jacques-Philippe Saint-Gérand L'analyse thématique des données textuelles. L'exemple des sentiments,dirigé par François Rastier, publié par Éveline Martin, avec la collaboration de Henri Béhar, Michel Bernard et al., Paris, Didier Érudition, coll. Études de sémantique lexicale, INaLF CNRS, 1995 , 1996 .

[15]  V. Dijk,et al.  Some aspects of text grammars : a study in theoretical linguistics and poetics , 1972 .

[16]  Gerald Salton,et al.  Automatic text processing , 1988 .

[17]  Werner Sollors,et al.  The return of thematic criticism , 1993 .

[18]  François Rastier,et al.  L'analyse thématique des données textuelles : l'exemple des sentiments , 1995 .

[19]  Vladimir Propp,et al.  Morphology of the folktale , 1959 .

[20]  Roel Popping,et al.  Computer-assisted text analysis , 2000 .

[21]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[22]  J. Meunier,et al.  La lecture et l'analyse de texte assistées par ordinateur (LATAO) comme sytème de traitement d'information , 1996 .

[23]  Fabrizio Sebastiani,et al.  A Tutorial on Automated Text Categorisation , 2000 .

[24]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[25]  Jean-Guy Meunier,et al.  La classification mathématique des textes : un outil d'assistance à la lecture et à l'analyse de textes philosophiques , 2000 .

[26]  Christopher A. Badurek,et al.  Review of Information visualization in data mining and knowledge discovery by Usama Fayyad, Georges G. Grinstein, and Andreas Wierse. Morgan Kaufmann 2002 , 2003 .

[27]  Michel Bernard Introduction aux études littéraires assistées par ordinateur , 1999 .

[28]  Mathias Rossignol,et al.  Automatic generation of sets of keywords for theme characterization and detection , 2002 .