论文信息 - Un modèle de données pour la textométrie : contribution à une interopérabilité entre outils

Un modèle de données pour la textométrie : contribution à une interopérabilité entre outils

The research community for textual data analysis is organizing itself to better develop textometric tools, and to be able to share the textual data they analyze. The goal is to make fonctionalities and datas to better interoperate. This is important to be able to clarify complex software architectures involving natural language processing tools and to be able to capitalize the work of preparation of input data. To be able to globaly compare the various fonctionalities of the different textometric tools, we propose a synthetic fonctional model composed of 4 axes: Statistical analysis, Text edition, Search engine and Text annotation. There exist several international initiatives to standardise textual data description (metadata) and the encoding of their content. Being diverse in their application and ever evolving, we propose a synthetic data model for textometric tools composed of 11 different parts. It has been build by the analysis of the data formats used by the textometric tools. We propose to make the tools interoperate with data at that level of description.

Serge Heiden

[1] Serge Heiden,et al. ETIQUETAGE d'un CORPUS HETEROGENE de FRANC AIS MEDIEVAL: ENJEUX et MODALITES , 2002 .

[2] Lexicométrie sur corpus étiquetés , 2004 .